How We Built a 100ms Policy Engine for AI Agent Transactions

TL;DR

The policy engine evaluates 15+ rule types per transaction in under 100ms p99 — inside the network authorization timeout, leaving headroom for the rest of the auth path.
The latency budget is spent on: card-context lookup (denormalized, ~10ms), policy load (cached, ~5ms), sequential rule evaluation with short-circuit (~30-50ms), ledger hold creation (~15ms), and webhook outbox enqueue (~5ms).
Every decline returns a reason code AND a human-readable explanation. Both go in the agent's audit trail.

When an AI agent makes a payment, the authorization request lands in our policy engine before it touches the network. We evaluate every transaction against a configurable rule set — and we have to do it fast enough to not blow the network's auth timeout. Here's how that's built.

Why does the latency budget matter so much?

Network authorization timeouts are unforgiving. Visa and Mastercard expect a response inside roughly 5 seconds end-to-end. Inside that, the issuer (us) gets a sub-second budget after subtracting acquirer hops, network latency, and 3DS challenge round-trips. Real working budget for our policy decision: ~100-150ms.

Anything slower triggers a "no answer" decline at the network — which looks identical to a server fault and burns trust with publishers. So the 100ms budget isn't aspirational; it's the threshold below which the system stops being a product.

What does the engine actually evaluate?

15+ rule types across four families:

Amount controls

Single-transaction limit
Daily, weekly, monthly spend caps
Approval-required threshold (sets a hold instead of authorizing)

Merchant controls

MCC allow/deny lists (e.g. allow SaaS, block gambling)
Specific merchant allow/deny by MID
Country (geo) allow/deny

Time controls

Business-hours window (configurable UTC)
Weekday-only / weekend-only
Emergency-stop flag (one toggle disables the agent)

Advanced controls

Currency allow list
Subscription-merchant lock (one card, one merchant, one subscription)
Budget-threshold alerts
Velocity caps (transactions per minute / hour / day)

Rules compose. A single agent might run "SaaS-only + $500/tx + business-hours-only + weekday-only" — that's 4 rules evaluated on every charge.

How is the 100ms budget actually spent?

Profiling p99 latency, the time goes here:

| Stage | Budget | |-------|--------| | Card-context lookup (denormalized cache) | ~10ms | | Policy snapshot load (Redis-cached, immutable per version) | ~5ms | | Rule evaluation (sequential, short-circuit on first failure) | 30-50ms | | Ledger hold creation (within the auth transaction) | ~15ms | | Webhook event enqueue (async via outbox table) | ~5ms | | Headroom + jitter | ~15ms |

The non-obvious decisions:

Denormalize aggressively. We don't normalize card → agent → policy at auth time. We materialize the joined view at issuance and at policy update, so authorization is a single lookup.
Snapshot policies, don't read them live. Policy edits create a new immutable version. The snapshot cache is hit-rate ~99.5%. Live reads only happen on first miss after a publish.
Short-circuit on first failure. Rules ordered by selectivity (most-likely-to-decline first) and by cheapness. Geo block is cheaper than velocity, so it runs first when both are configured.
Defer everything async-able. Webhooks, audit log writes (beyond the ledger hold), fraud-model retraining signals — all go through an outbox pattern. The auth response leaves the building before they fire.

What about explainability?

A decline with no reason is worse than no decline. So every evaluation outputs:

A machine-readable reason code (mcc_blocked, velocity_exceeded, geo_denied, etc.)
A human-readable explanation ("Merchant category 7995 (gambling) is blocked by policy 'TravelBot v1'.")
The exact rule that fired (rule ID + policy version)

Publishers can drill into the dashboard and see, for any decline, which rule blocked it. Authoring policies is much faster when "why did this decline?" returns a sentence instead of a code.

What we'd change

In hindsight:

We over-engineered the rule DSL early. The first version supported nested boolean expressions. We shipped it, found that 95% of policies use ≤5 simple rules, and now expose templates instead of the DSL by default.
Webhook delivery semantics. First version was at-most-once with a 1-retry. Publishers wanted at-least-once with idempotency keys. Re-architecting that mid-flight cost more than designing for it from day one.
We don't model agent-specific fraud yet. Velocity rules are time-based, not behavior-based. A separate post on agent-aware fraud detection covers what we're building next.

What's next on the engine

Three things ship this quarter:

Fraud scoring as a first-class rule type — pulling agent-aware behavioral signals into the eval path.
Multi-acquirer routing — when one acquirer degrades, route the auth to a healthy one transparently.
A/B testing for routing strategies — compare interchange-optimized routing vs latency-optimized vs success-rate-optimized.

FAQ

Why not run rules in parallel instead of sequentially?

Two reasons. First, the cheapest rules decline most transactions, so short-circuit evaluation has lower expected latency than parallel-and-aggregate. Second, parallel evaluation makes "which rule fired?" ambiguous — you'd need to define a precedence anyway, which puts you back at sequential.

What's the auth path latency without the policy engine?

For comparison: the bare-bones auth path (card lookup → ledger hold → response) runs in ~25-30ms p99. The policy engine adds 30-50ms on a typical transaction. Trade-off worth making.

How do you handle a policy update mid-transaction?

Each policy version is immutable. A transaction in flight evaluates against the snapshot it was issued under. A new version applies to the next transaction. This avoids race conditions and makes audit logs reproducible.

Can publishers write custom rules?

Yes — via a constrained expression language for amount/MCC/time/geo predicates. We don't allow arbitrary code (latency budget; security). Custom logic that doesn't fit the DSL goes through approval webhooks (the policy returns "approval required," your service decides).

Why is the budget 100ms and not 50ms?

Because we'd rather support more rule types than push to 50ms. The 100ms budget leaves the rest of the auth path enough headroom to handle 3DS, multi-acquirer retry, and acquirer-side latency variance without breaching the 5s network timeout.

External references

EMVCo 3D Secure 2.x specification — authentication adjacent to policy decisions
PCI DSS v4.0 requirement 6.4.3 (custom application security) — relevant for the policy DSL boundary

By Vlad K. Last updated 2026-04-29.

How We Built a 100ms Policy Engine for AI Agent Transactions

How We Built a 100ms Policy Engine for AI Agent Transactions

Why does the latency budget matter so much?

What does the engine actually evaluate?

How is the 100ms budget actually spent?

What about explainability?

What we'd change

What's next on the engine

FAQ

Why not run rules in parallel instead of sequentially?

What's the auth path latency without the policy engine?

How do you handle a policy update mid-transaction?

Can publishers write custom rules?

Why is the budget 100ms and not 50ms?

Related reading

External references