Customer Contracts
Quotas and rate limits
Matter's per-tier rate limits and resource quotas. Soft-limit warnings at 80% via Matter-Quota-Warning header before hard 429s. Per-operation limits declared in spec via x-matter-rate-limit and x-matter-quota-tier.
Last updated
Matter's quota system has two layers: per-token rate limits (sliding window, drives 429 responses) and per-account quotas (monthly resource consumption, drives plan-tier upgrades).
Per-operation limits are declared in the OpenAPI spec via x-matter-rate-limit and x-matter-quota-tier. The runtime enforces them. The customer-facing dashboard surfaces live consumption.
Plan tiers
| Tier | Rate limit class | Monthly quotas | Sandbox quotas | Test-mode quotas |
|---|---|---|---|---|
| Free | low | 25 entities, 100k API calls, 100 webhook events, 1k events delivered | 5 entities, 10k API calls | 25 entities, 100k API calls |
| Starter | medium | 250 entities, 1M API calls, 10k webhook events, 100k events delivered | 50 entities, 100k API calls | unlimited |
| Growth | high | 2.5k entities, 10M API calls, 100k webhook events, 1M events delivered | 500 entities, 1M API calls | unlimited |
| Enterprise | enterprise | unlimited (negotiated) | unlimited | unlimited |
Beyond the tier limits, per-operation rate limits apply within each tier. The spec declares them via x-matter-rate-limit per operation; the codegen at apps/api/scripts/generate-rate-limits.ts emits the runtime policy matrix.
Per-operation rate limits (selected)
Limits below are illustrative — the source of truth is x-matter-rate-limit in the spec.
| Operation class | Free | Starter | Growth | Enterprise |
|---|---|---|---|---|
Read (cached, e.g., GET /v1/api-versions) | 1000/min | 5000/min | 25000/min | 100000/min |
Read (uncached, e.g., GET /v1/entities) | 60/min | 300/min | 1500/min | 10000/min |
Read (computed, e.g., GET /v1/entities/{id}/cap_table) | 20/min | 100/min | 500/min | 5000/min |
Mutation (sync, e.g., POST /v1/tokens) | 30/min | 150/min | 750/min | 5000/min |
Mutation (async, e.g., POST /v1/entities) | 20/min | 100/min | 500/min | 3000/min |
Composite saga start (e.g., POST /v1/entities/{id}/formation_packet) | 5/min | 25/min | 100/min | 500/min |
| Bulk operation | 1/min | 5/min | 25/min | 100/min |
| Customer data export | 1/day | 5/day | 25/day | unlimited |
Test-mode tokens receive a 10× relaxed bucket so test runs do not choke.
Soft-limit warnings
When a token's consumption reaches 80 % of its rate limit for an operation:
- Response carries
Matter-Quota-Warning: <op>; usage=<x>; limit=<y>; reset=<unix>header. - The dashboard surfaces a warning notification.
- An email is sent to the account billing contact (max once per 24 hours per operation).
When a token's consumption reaches 80 % of its monthly quota for any resource (entities, API calls, webhook events, events delivered):
- Response carries
Matter-Quota-Warning: <resource>; usage=<x>; limit=<y>; reset=<month_end>header. - The dashboard surfaces a prominent warning.
- An email is sent to the account billing contact.
At 100 % of a rate limit: requests return 429 op_rate_limit_exceeded with Retry-After header indicating when the window resets.
At 100 % of a monthly quota: requests return 429 op_quota_exceeded with detail containing the upgrade-path link. The account's existing rows continue to read and serve traffic; only new mutations against the exhausted resource are rate-limited.
Quota and limit introspection
Customers can query their own quota status:
GET /v1/limits
Authorization: Bearer <token>Returns:
{
"object": "limits",
"tier": "growth",
"rate_limits": [
{
"operation_class": "read.uncached",
"limit_per_minute": 1500,
"current_usage": 432,
"warning_threshold": 1200
}
],
"monthly_quotas": [
{
"resource": "entities",
"limit": 2500,
"current_usage": 1893,
"warning_threshold": 2000,
"reset_at": "2026-06-01T00:00:00Z"
}
]
}The customer dashboard (P0.I7) renders this live.
Quota dashboard
Customers see live consumption at https://app.mattermode.com/usage. The dashboard:
- Live consumption per resource + per operation class.
- 30-day usage trend.
- Headroom indicator (how much capacity remains in the current window).
- Soft-limit warning history.
- Upgrade path with one-click plan change.
What gets rate-limited vs what gets queued
Rate-limited (429):
- Synchronous read operations.
- Synchronous mutations that complete in the request lifecycle.
Queued or accepted with degraded performance (no 429; SLO honored with eventual completion):
- Async mutations (
POST /v1/entities) — the 202 is granted; the Request resource is created; the worker dispatches when capacity allows. - Webhook deliveries — backpressure on the outbound queue per resilience primitives Truth 1.
- Background jobs — priority-queued.
The rationale: rate limits should protect Matter's runtime from overload, not penalise customers for legitimate async patterns.
What is NOT counted toward quotas
The following operations and responses do not consume quota:
- Error responses (4xx + 5xx). A 400 validation error or 5xx infrastructure error does not count toward the customer's rate limit.
- Replay-by-idempotency-key requests. The replay returns the original response; the second call does not consume an additional quota unit.
- Dry-run requests (
?dry_run=true). These are no-op for state but do consume a fraction of a quota unit (rate-limited at 10× the underlying operation's limit) to deter abuse. - Webhook receiver retries. Matter's retry attempts toward the customer's endpoint do not consume the customer's quota.
- Customer-initiated data export. Has its own per-day quota.
Customer-initiated quota adjustments
Beyond the standard plan tiers, customers may negotiate:
- Burst capacity — temporarily higher rate limit for a known event (e.g., a Series A close package executing for 100 portfolio companies in one weekend). Submit to
support@mattermode.com≥ 72 hours in advance. - Sustained high-traffic — customer's pattern exceeds Growth tier sustainable. Migrate to Enterprise tier.
- Per-operation overrides — Enterprise customers may negotiate specific operation limits beyond the tier defaults.
Why rate limits exist
Three reasons.
- Cost protection for Matter (and indirectly for customers). Unlimited unconstrained traffic would push the per-operation cost-per-request budget into the red; Matter would pass on costs.
- Multi-tenant fairness. A single customer cannot consume all of a region's capacity and degrade service for others.
- Anomaly detection signal. A customer's token suddenly hitting the rate limit is a leading indicator that something is wrong (compromise, integration bug, abusive pattern).
Rate limits are the floor, not the ceiling. Most customers do not hit them; those who do receive proactive comms.
See also
- SLA — availability commitments.
- Metering — what's billable.
- SLOs — internal performance targets.
- API consistency suite — automated checks for limit uniformity.