How to Let AI Agents Make Payments Safely
TL;DR
- Five layers make agent payments safe: delegated authorization (scoped authority), policy enforcement at auth time (real-time rules), merchant + MCC controls (where it can spend), velocity caps (how often), human-in-the-loop fallbacks (when policy says "ask first").
- "Safe" doesn't mean "never wrong." It means: failures are bounded, auditable, recoverable, and fast to contain.
- The model below is what we run in production.
Letting an AI agent transact is letting software make decisions in milliseconds with real money. Safe doesn't mean infallible — agents will sometimes do things you didn't intend. Safe means failures are bounded, every charge has an audit trail, and you can stop a misbehaving agent in seconds.
What are the five safety layers?
Layer 1: Delegated authorization (scope)
Agents don't hold credit card credentials. They hold a delegation — a scoped, revocable authorization to spend on a specific card under specific rules. Card data stays vaulted. The agent's authority is bounded by the delegation.
Why this matters: a compromised agent (prompt injection, infrastructure breach) can only do what the delegation allows. Scope is the first line of defense.
See Delegated Payment Authorization.
Layer 2: Policy enforcement at auth time
Every transaction is evaluated against the policy before authorization is granted. 15+ rule types: amount, MCC, geo, time, velocity, custom approval thresholds. Sub-100ms decision in the auth path.
Why this matters: rules are enforced where it matters — at the network. Application-layer rules can be bypassed by a misbehaving agent. Network-layer rules can't.
See How We Built a 100ms Policy Engine.
Layer 3: Merchant + MCC controls (where)
Whitelist of merchants the agent can transact with. Block list of merchant categories (gambling, crypto, tobacco, etc.). Layered: whitelist as outer boundary, MCC as categorical filter.
Why this matters: even within a sensible spending policy, you don't want an agent paying off-brand impostors or spending outside its purpose. Where-controls catch the "right amount, wrong merchant" failure mode.
See How to Prevent AI Agents From Spending at the Wrong Merchants.
Layer 4: Velocity caps (how often)
Limits on transactions per minute, hour, day. Catches runaway loops, compromised credentials, and prompt-injection attacks that would otherwise fire many small charges fast.
Why this matters: many failure modes manifest as elevated transaction frequency before they show up as suspicious amounts. Velocity is the early warning + the rate limiter.
Layer 5: Human-in-the-loop fallbacks
For high-value or anomalous transactions, the policy says "ask first." The transaction is held; a notification fires to the user; user approves or denies. Policy variants:
- Per-transaction approval threshold ("approve every charge over $X").
- Pattern-based approval ("approve every charge to a new merchant").
- Time-window approval ("approve every charge outside business hours").
Why this matters: the four prior layers are automated. Human-in-the-loop is the fallback for cases where automation isn't enough — typically high-value or novel decisions.
How do these layers compose?
Order of evaluation in the auth path:
- Delegation valid? If revoked/expired → decline.
- Merchant whitelist OK? If not → decline.
- MCC allowed? If blocked → decline.
- Amount within cap? Velocity OK? Geo OK? Time OK? If not → decline.
- Approval-required threshold hit? If yes → hold + notify human.
- All checks pass → authorize.
Short-circuit on first failure. Decline reasons are explicit; you know which layer fired.
What does "safe" actually look like in production?
Concrete metrics from a production agent platform with the five layers operational:
- Decline rate from policy checks: ~3-5% (legitimate declines from policy, not fraud).
- Decline rate from fraud: ~0.1-0.3% (residual fraud caught by agent-aware models).
- Mean time to contain a misbehaving agent: <30 seconds (emergency-stop API call to all-card-frozen).
- Audit trail completeness: 100% (every charge logged with full context).
- Compliance evidence pack on demand: queries return in seconds.
What's NOT covered by these layers?
Three remaining risks:
1. Logic errors in the agent itself. The agent decides to do the wrong thing within the rules. ("Buy this specific concert ticket" instead of normal travel booking.) The payment platform can't catch this — it requires application-layer logic about what the agent is supposed to be doing.
2. Compromised user identity. If an attacker takes over the user's account, they can issue new delegations or modify existing ones. Account-security controls (2FA, anomaly detection on user behavior) are upstream of the payment layer.
3. Novel attack vectors. New attack patterns that the existing rules don't model. The agent-aware fraud model needs to keep up. See Agent-Aware Fraud Detection.
What about regulatory compliance?
The five layers map to specific regulatory expectations:
- Delegation = SCA exemption framework (PSD2 RTS Article 13). Delegated authorization can satisfy strong customer authentication for low-risk recurring + agent-initiated transactions.
- Policy + audit trail = transaction monitoring (AML / EU 6th Directive). Every decision logged with reason.
- Human-in-the-loop = customer-explicit-consent flows. When required.
- Per-agent ledger = transaction reportability. Required for KYB and ongoing monitoring.
Not legal advice, but the model is built to fit existing regulatory frameworks.
FAQ
Is this enough for production?
For most agent products, yes. High-stakes verticals (financial agents trading real assets, healthcare agents making medical decisions) need additional layers — formal verification of agent behavior, regulatory pre-clearance, and operational SLAs that include rapid rollback.
What if the user wants no human-in-the-loop at all?
Possible. Layer 5 is optional. Configure the policy without approval thresholds. The other four layers still apply.
How do I handle a compromised delegation?
Revoke the delegation via API. Sub-second to network refusal. Optionally also: terminate all cards under that delegation (destructive); send notification to the user; flag the agent for review.
Are these layers Shatale-specific?
The framework applies to any agent payment platform. Shatale implements all five natively. DIY stacks need to build each layer separately and ensure they share state.
Can agents learn to bypass these layers?
The layers are enforced server-side by the platform — the agent can't bypass them by changing its own code. What the agent CAN do is operate within the layers in unintended ways. Application-layer logic + monitoring is the answer to that.
Related reading
- Delegated Payment Authorization — Layer 1
- How We Built a 100ms Policy Engine — Layer 2
- How to Prevent AI Agents From Spending at the Wrong Merchants — Layer 3
- What Are Velocity Rules — Layer 4
- Agent-Aware Fraud Detection — adjacent layer
External references
- PSD2 RTS on SCA exemptions — delegation regulatory framework
- EU 6th Anti-Money Laundering Directive — transaction monitoring requirements
- Visa Delegated Authentication — network-level delegation framework
By Kristina Medvedeva. Last updated 2026-04-29.