Quotas and rate limits

Last updated Jun 14, 2026

Matter's quota system has two layers: per-token rate limits (sliding window, drives 429 responses) and per-account quotas (monthly resource consumption, drives plan-tier upgrades).

Per-operation limits are declared in the OpenAPI spec via x-matter-rate-limit and x-matter-quota-tier. The runtime enforces them. The customer-facing dashboard surfaces live consumption.

Plan tiers

Tier	Rate limit class	Monthly quotas	Sandbox quotas	Test-mode quotas
Free	low	25 entities, 100k API calls, 100 webhook events, 1k events delivered	5 entities, 10k API calls	25 entities, 100k API calls
Starter	medium	250 entities, 1M API calls, 10k webhook events, 100k events delivered	50 entities, 100k API calls	unlimited
Growth	high	2.5k entities, 10M API calls, 100k webhook events, 1M events delivered	500 entities, 1M API calls	unlimited
Enterprise	enterprise	unlimited (negotiated)	unlimited	unlimited

Beyond the tier limits, per-operation rate limits apply within each tier. The spec declares them via x-matter-rate-limit per operation; the codegen at apps/api/scripts/generate-rate-limits.ts emits the runtime policy matrix.

Per-operation rate limits (selected)

Limits below are illustrative — the source of truth is x-matter-rate-limit in the spec.

Operation class	Free	Starter	Growth	Enterprise
Read (cached, e.g., `GET /v1/api-versions`)	1000/min	5000/min	25000/min	100000/min
Read (uncached, e.g., `GET /v1/entities`)	60/min	300/min	1500/min	10000/min
Read (computed, e.g., `GET /v1/entities/{id}/cap_table`)	20/min	100/min	500/min	5000/min
Mutation (sync, e.g., `POST /v1/tokens`)	30/min	150/min	750/min	5000/min
Mutation (async, e.g., `POST /v1/entities`)	20/min	100/min	500/min	3000/min
Composite saga start (e.g., `POST /v1/entities/{id}/formation_packet`)	5/min	25/min	100/min	500/min
Bulk operation	1/min	5/min	25/min	100/min
Customer data export	1/day	5/day	25/day	unlimited

Test-mode tokens receive a 10× relaxed bucket so test runs do not choke.

Soft-limit warnings

When a token's consumption reaches 80 % of its rate limit for an operation:

Response carries Matter-Quota-Warning: <op>; usage=<x>; limit=<y>; reset=<unix> header.
The dashboard surfaces a warning notification.
An email is sent to the account billing contact (max once per 24 hours per operation).

When a token's consumption reaches 80 % of its monthly quota for any resource (entities, API calls, webhook events, events delivered):

Response carries Matter-Quota-Warning: <resource>; usage=<x>; limit=<y>; reset=<month_end> header.
The dashboard surfaces a prominent warning.
An email is sent to the account billing contact.

At 100 % of a rate limit: requests return 429 op_rate_limit_exceeded with Retry-After header indicating when the window resets.

At 100 % of a monthly quota: requests return 429 op_quota_exceeded with detail containing the upgrade-path link. The account's existing rows continue to read and serve traffic; only new mutations against the exhausted resource are rate-limited.

Quota and limit introspection

Customers can query their own quota status:

GET /v1/limits
Authorization: Bearer <token>

Returns:

{
  "object": "limits",
  "tier": "growth",
  "rate_limits": [
    {
      "operation_class": "read.uncached",
      "limit_per_minute": 1500,
      "current_usage": 432,
      "warning_threshold": 1200
    }
  ],
  "monthly_quotas": [
    {
      "resource": "entities",
      "limit": 2500,
      "current_usage": 1893,
      "warning_threshold": 2000,
      "reset_at": "2026-06-01T00:00:00Z"
    }
  ]
}

The customer dashboard (P0.I7) renders this live.

Quota dashboard

Customers see live consumption at https://app.mattermode.com/usage. The dashboard:

Live consumption per resource + per operation class.
30-day usage trend.
Headroom indicator (how much capacity remains in the current window).
Soft-limit warning history.
Upgrade path with one-click plan change.

What gets rate-limited vs what gets queued

Rate-limited (429):

Synchronous read operations.
Synchronous mutations that complete in the request lifecycle.

Queued or accepted with degraded performance (no 429; SLO honored with eventual completion):

Async mutations (POST /v1/entities) — the 202 is granted; the Request resource is created; the worker dispatches when capacity allows.
Webhook deliveries — backpressure on the outbound queue per resilience primitives Truth 1.
Background jobs — priority-queued.

The rationale: rate limits should protect Matter's runtime from overload, not penalise customers for legitimate async patterns.

What is NOT counted toward quotas

The following operations and responses do not consume quota:

Error responses (4xx + 5xx). A 400 validation error or 5xx infrastructure error does not count toward the customer's rate limit.
Replay-by-idempotency-key requests. The replay returns the original response; the second call does not consume an additional quota unit.
Dry-run requests (?dry_run=true). These are no-op for state but do consume a fraction of a quota unit (rate-limited at 10× the underlying operation's limit) to deter abuse.
Webhook receiver retries. Matter's retry attempts toward the customer's endpoint do not consume the customer's quota.
Customer-initiated data export. Has its own per-day quota.

Customer-initiated quota adjustments

Beyond the standard plan tiers, customers may negotiate:

Burst capacity — temporarily higher rate limit for a known event (e.g., a Series A close package executing for 100 portfolio companies in one weekend). Submit to support@mattermode.com ≥ 72 hours in advance.
Sustained high-traffic — customer's pattern exceeds Growth tier sustainable. Migrate to Enterprise tier.
Per-operation overrides — Enterprise customers may negotiate specific operation limits beyond the tier defaults.

Why rate limits exist

Three reasons.

Cost protection for Matter (and indirectly for customers). Unlimited unconstrained traffic would push the per-operation cost-per-request budget into the red; Matter would pass on costs.
Multi-tenant fairness. A single customer cannot consume all of a region's capacity and degrade service for others.
Anomaly detection signal. A customer's token suddenly hitting the rate limit is a leading indicator that something is wrong (compromise, integration bug, abusive pattern).

Rate limits are the floor, not the ceiling. Most customers do not hit them; those who do receive proactive comms.

Plan tiers

Tier	Rate limit class	Monthly quotas	Sandbox quotas	Test-mode quotas
Free	low	25 entities, 100k API calls, 100 webhook events, 1k events delivered	5 entities, 10k API calls	25 entities, 100k API calls
Starter	medium	250 entities, 1M API calls, 10k webhook events, 100k events delivered	50 entities, 100k API calls	unlimited
Growth	high	2.5k entities, 10M API calls, 100k webhook events, 1M events delivered	500 entities, 1M API calls	unlimited
Enterprise	enterprise	unlimited (negotiated)	unlimited	unlimited

Per-operation rate limits (selected)

Limits below are illustrative — the source of truth is x-matter-rate-limit in the spec.

Operation class	Free	Starter	Growth	Enterprise
Read (cached, e.g., `GET /v1/api-versions`)	1000/min	5000/min	25000/min	100000/min
Read (uncached, e.g., `GET /v1/entities`)	60/min	300/min	1500/min	10000/min
Read (computed, e.g., `GET /v1/entities/{id}/cap_table`)	20/min	100/min	500/min	5000/min
Mutation (sync, e.g., `POST /v1/tokens`)	30/min	150/min	750/min	5000/min
Mutation (async, e.g., `POST /v1/entities`)	20/min	100/min	500/min	3000/min
Composite saga start (e.g., `POST /v1/entities/{id}/formation_packet`)	5/min	25/min	100/min	500/min
Bulk operation	1/min	5/min	25/min	100/min
Customer data export	1/day	5/day	25/day	unlimited

Test-mode tokens receive a 10× relaxed bucket so test runs do not choke.

Soft-limit warnings

When a token's consumption reaches 80 % of its rate limit for an operation:

Response carries Matter-Quota-Warning: <op>; usage=<x>; limit=<y>; reset=<unix> header.
The dashboard surfaces a warning notification.
An email is sent to the account billing contact (max once per 24 hours per operation).

When a token's consumption reaches 80 % of its monthly quota for any resource (entities, API calls, webhook events, events delivered):

Response carries Matter-Quota-Warning: <resource>; usage=<x>; limit=<y>; reset=<month_end> header.
The dashboard surfaces a prominent warning.
An email is sent to the account billing contact.

At 100 % of a rate limit: requests return 429 op_rate_limit_exceeded with Retry-After header indicating when the window resets.

Quota and limit introspection

Customers can query their own quota status:

GET /v1/limits
Authorization: Bearer <token>

Returns:

{
  "object": "limits",
  "tier": "growth",
  "rate_limits": [
    {
      "operation_class": "read.uncached",
      "limit_per_minute": 1500,
      "current_usage": 432,
      "warning_threshold": 1200
    }
  ],
  "monthly_quotas": [
    {
      "resource": "entities",
      "limit": 2500,
      "current_usage": 1893,
      "warning_threshold": 2000,
      "reset_at": "2026-06-01T00:00:00Z"
    }
  ]
}

The customer dashboard (P0.I7) renders this live.

Quota dashboard

Customers see live consumption at https://app.mattermode.com/usage. The dashboard:

Live consumption per resource + per operation class.
30-day usage trend.
Headroom indicator (how much capacity remains in the current window).
Soft-limit warning history.
Upgrade path with one-click plan change.

What gets rate-limited vs what gets queued

Rate-limited (429):

Synchronous read operations.
Synchronous mutations that complete in the request lifecycle.

Queued or accepted with degraded performance (no 429; SLO honored with eventual completion):

Async mutations (POST /v1/entities) — the 202 is granted; the Request resource is created; the worker dispatches when capacity allows.
Webhook deliveries — backpressure on the outbound queue per resilience primitives Truth 1.
Background jobs — priority-queued.

The rationale: rate limits should protect Matter's runtime from overload, not penalise customers for legitimate async patterns.

What is NOT counted toward quotas

The following operations and responses do not consume quota:

Error responses (4xx + 5xx). A 400 validation error or 5xx infrastructure error does not count toward the customer's rate limit.
Replay-by-idempotency-key requests. The replay returns the original response; the second call does not consume an additional quota unit.
Dry-run requests (?dry_run=true). These are no-op for state but do consume a fraction of a quota unit (rate-limited at 10× the underlying operation's limit) to deter abuse.
Webhook receiver retries. Matter's retry attempts toward the customer's endpoint do not consume the customer's quota.
Customer-initiated data export. Has its own per-day quota.

Customer-initiated quota adjustments

Beyond the standard plan tiers, customers may negotiate:

Burst capacity — temporarily higher rate limit for a known event (e.g., a Series A close package executing for 100 portfolio companies in one weekend). Submit to support@mattermode.com ≥ 72 hours in advance.
Sustained high-traffic — customer's pattern exceeds Growth tier sustainable. Migrate to Enterprise tier.
Per-operation overrides — Enterprise customers may negotiate specific operation limits beyond the tier defaults.

Why rate limits exist

Three reasons.

Cost protection for Matter (and indirectly for customers). Unlimited unconstrained traffic would push the per-operation cost-per-request budget into the red; Matter would pass on costs.
Multi-tenant fairness. A single customer cannot consume all of a region's capacity and degrade service for others.
Anomaly detection signal. A customer's token suddenly hitting the rate limit is a leading indicator that something is wrong (compromise, integration bug, abusive pattern).

Rate limits are the floor, not the ceiling. Most customers do not hit them; those who do receive proactive comms.

Quotas and rate limits

Plan tiers

Per-operation rate limits (selected)

Soft-limit warnings

Quota and limit introspection

Quota dashboard

What gets rate-limited vs what gets queued

What is NOT counted toward quotas

Customer-initiated quota adjustments

Why rate limits exist

See also

On this page

Quotas and rate limits

Plan tiers

Per-operation rate limits (selected)

Soft-limit warnings

Quota and limit introspection

Quota dashboard

What gets rate-limited vs what gets queued

What is NOT counted toward quotas

Customer-initiated quota adjustments

Why rate limits exist

See also

On this page