How We Built a 100ms Policy Engine for AI Agent Transactions

TL;DR

When an AI agent makes a payment, the authorization request lands in our policy engine before it touches the network. We evaluate every transaction against a configurable rule set — and we have to do it fast enough to not blow the network's auth timeout. Here's how that's built.

Why does the latency budget matter so much?

Network authorization timeouts are unforgiving. Visa and Mastercard expect a response inside roughly 5 seconds end-to-end. Inside that, the issuer (us) gets a sub-second budget after subtracting acquirer hops, network latency, and 3DS challenge round-trips. Real working budget for our policy decision: ~100-150ms.

Anything slower triggers a "no answer" decline at the network — which looks identical to a server fault and burns trust with publishers. So the 100ms budget isn't aspirational; it's the threshold below which the system stops being a product.

What does the engine actually evaluate?

15+ rule types across four families:

Amount controls

Merchant controls

Time controls

Advanced controls

Rules compose. A single agent might run "SaaS-only + $500/tx + business-hours-only + weekday-only" — that's 4 rules evaluated on every charge.

How is the 100ms budget actually spent?

Profiling p99 latency, the time goes here:

| Stage | Budget | |-------|--------| | Card-context lookup (denormalized cache) | ~10ms | | Policy snapshot load (Redis-cached, immutable per version) | ~5ms | | Rule evaluation (sequential, short-circuit on first failure) | 30-50ms | | Ledger hold creation (within the auth transaction) | ~15ms | | Webhook event enqueue (async via outbox table) | ~5ms | | Headroom + jitter | ~15ms |

The non-obvious decisions:

What about explainability?

A decline with no reason is worse than no decline. So every evaluation outputs:

Publishers can drill into the dashboard and see, for any decline, which rule blocked it. Authoring policies is much faster when "why did this decline?" returns a sentence instead of a code.

What we'd change

In hindsight:

What's next on the engine

Three things ship this quarter:

  1. Fraud scoring as a first-class rule type — pulling agent-aware behavioral signals into the eval path.
  2. Multi-acquirer routing — when one acquirer degrades, route the auth to a healthy one transparently.
  3. A/B testing for routing strategies — compare interchange-optimized routing vs latency-optimized vs success-rate-optimized.

FAQ

Why not run rules in parallel instead of sequentially?

Two reasons. First, the cheapest rules decline most transactions, so short-circuit evaluation has lower expected latency than parallel-and-aggregate. Second, parallel evaluation makes "which rule fired?" ambiguous — you'd need to define a precedence anyway, which puts you back at sequential.

What's the auth path latency without the policy engine?

For comparison: the bare-bones auth path (card lookup → ledger hold → response) runs in ~25-30ms p99. The policy engine adds 30-50ms on a typical transaction. Trade-off worth making.

How do you handle a policy update mid-transaction?

Each policy version is immutable. A transaction in flight evaluates against the snapshot it was issued under. A new version applies to the next transaction. This avoids race conditions and makes audit logs reproducible.

Can publishers write custom rules?

Yes — via a constrained expression language for amount/MCC/time/geo predicates. We don't allow arbitrary code (latency budget; security). Custom logic that doesn't fit the DSL goes through approval webhooks (the policy returns "approval required," your service decides).

Why is the budget 100ms and not 50ms?

Because we'd rather support more rule types than push to 50ms. The 100ms budget leaves the rest of the auth path enough headroom to handle 3DS, multi-acquirer retry, and acquirer-side latency variance without breaching the 5s network timeout.

Related reading

External references


By Vlad K. Last updated 2026-04-29.