ARCHITECTURE BLUEPRINT

The M:N Compound Pricing Playbook for AI Agents

How enterprise AI platforms use Aforo's gateway-edge architecture to decouple infrastructure costs from Go-To-Market packaging, bill for “Business Value,” and prevent margin bleed.

Read the Technical Guide See the Architecture

THE TRAP

The 1:1 Billing Monolith

When enterprises build autonomous agents using frontier models like Anthropic's Claude, legacy billing providers force them into a rigid 1:1 mapping: one API call equals one charge. But a single LLM inference produces 4–8 distinct billable dimensions — input tokens, output tokens, cache reads, extended thinking, tool calls — each at a different rate.

This architectural mismatch creates two systemic problems:

Go-To-Market Friction

End-users — lawyers, analysts, marketers — don't want to buy “Cache Read Tokens” or “Extended Thinking Tokens.” They want to buy a “Contract Review” or an “Agent Run.” The 1:1 model leaks infrastructure complexity into your pricing page.

Margin Bleed and Latency

When an autonomous agent loops or escalates to a higher-cost model mid-task, the underlying COGS can silently exceed the customer-facing charge. Without real-time margin enforcement at the network edge, platforms resort to custom quota scripts that add latency and pull engineers off core product work.

THE BLUEPRINT

Gateway Integration & M:N Pricing

The modern standard for AI monetization requires two architectural pillars working in concert.

01

M:N Compound Pricing

Aforo's engine ingests multiple infrastructure metrics — input tokens, output tokens, vector DB lookups, MCP tool durations, cache hits — and composes them into a single, abstracted business-value charge:

$X.XXper “Business Value Unit”

02

Zero-Code Gateway Enforcement

Aforo acts as an asynchronous sidecar at the API Gateway. We process high-cardinality telemetry out-of-band and push post-call state updates back to Kong, Apigee, and AWS API Gateway. When cumulative COGS threatens gross margin, Aforo's Margin Guard updates the gateway state and blocks the next request — with literally zero latency added to your core inference hot path.

Async Sidecar · 0ms Inference Latency · Out-of-Band · fail-open

THE ARCHITECTURAL ROI

What changes when you get this right

0ms

Inference Latency Added

Aforo is a pure async sidecar. Telemetry is processed out-of-band. Post-call state is pushed back to the gateway. Your inference hot path is untouched.

Absolute

Margin Protection

Margin Guard trips the circuit breaker at the API gateway edge before a single unprofitable request reaches the inference engine. Silent revenue leakage is eliminated by design.

Zero

Jira Tickets for Pricing

Product managers compose new abstracted pricing tiers in the Aforo UI without engineering involvement. Pricing logic is fully decoupled from application code.

The Aforo Advantage

Abstract the complexity.
Protect the margin.

Aforo's M:N Compound Pricing lets you ingest raw infrastructure telemetry — tokens, vector DB lookups, MCP tool durations, cache hits — and compose it into a single, clean “Business Value” charge for your end customer. Your buyer sees “$0.85 per Contract Reviewed”. You see the five infrastructure meters underneath.

If an autonomous agent loops and the underlying compute threatens your gross margin, Aforo's Margin Guard trips the circuit breaker at the API gateway edge — before you lose a dollar. Infrastructure reality stays decoupled from your go-to-market packaging.

Deploy in Minutes

LIVE SIMULATOR

M:N Compound Pricing

M: RAW INFRASTRUCTURE TELEMETRY

Input Tokens

@ $3.00/M

12,000

Output Tokens

@ $15.00/M

4,500

Vector DB Lookups

@ $0.02/query

Total COGS$0.2235

N: CUSTOMER-FACING CHARGE

QTYITEMAMOUNT

1xContract Reviewed$0.85

TOTAL$0.85

MARGIN HEALTH

Gross Margin73.7%

0%40% safe100%

0

Billing dimensions per API call

0ms

Inference latency added

0

Composable pricing models

0%

Charge-to-request attribution

PRICING MODELS

Six models, one rate plan

A real LLM offering combines 3-4 of these within a single rate plan. Aforo composes them natively.

Model

LLM Use Case

Example

Per-Unit

Token pricing

$3.00/M input tokens

Graduated

Volume discounts

First 10M at $3, next 50M at $2.50

Volume-Tiered

Whole-volume rate

60M tokens → all at $2.00/M

Included Quota

Free tier

1M tokens free, then $5/M

Flat Rate

Platform fee

$200/month base

Percentage

Revenue share

2.5% of inference cost

HOW IT WORKS

Three stages from raw token to revenue

Compound event ingestion, real-time spend tracking, and zero-latency async enforcement — in that order.

01

Ingest

Emit a single compound event per API call. The platform decomposes it into rated events per metric.

One SDK call captures input tokens, output tokens, cache hits, batch flags, and model identifiers. The correlation ID links every decomposed event back to the original request for full auditability.

02

Rate

Each metric flows through its own pricing model. Spend counters update in real-time. Tier promotions trigger automatically.

Redis-backed spend counters evaluate tier thresholds on every event. When a customer crosses $50/month cumulative spend, their rate limits upgrade instantly -- no manual intervention, no invoice-time recalculation.

03

Enforce

Aforo runs as an async sidecar — processing telemetry out-of-band and pushing gateway state updates post-call. Zero latency added to the inference hot path. Denied requests get 429 + Retry-After on the next call.

The enforcement endpoint runs as a Kong plugin, AWS Lambda@Edge, or Azure APIM policy. It checks rate limits, wallet balance, and cumulative quota in a single Redis round-trip. If Redis is unavailable, the request passes through -- we bill retroactively rather than block paying customers.

Ready to monetize your AI Agents?

Aforo handles compound events at the gateway edge, spend-based tiers, and real-time quota enforcement—so you can focus on building your core product, not your monetization infrastructure.

Get Started Read the Technical Guide

The M:N Compound Pricing Playbook for AI Agents

The 1:1 Billing Monolith

Go-To-Market Friction

Margin Bleed and Latency

Gateway Integration & M:N Pricing

M:N Compound Pricing

Zero-Code Gateway Enforcement

What changes when you get this right

Abstract the complexity. Protect the margin.

Six models, one rate plan

Three stages from raw token to revenue

Ingest

Rate

Enforce

Ready to monetize your AI Agents?

Abstract the complexity.
Protect the margin.