Last updated Jun 14, 2026

Load shed engagement (SEV3)

Triggered by: queue depth > threshold OR p99 > 2× SLO sustained for ≥ 5 minutes. apps/api/lib/middleware/load-shed.ts engages, returning 503 + Retry-After on non-critical requests.

On-call: Platform.

What load-shed preserves vs drops

Preserved (always):

Writes (any mutation).
Audit endpoints.
Dissolution path.
Webhook delivery.

Dropped under shed:

List endpoints (read).
Reports + expand-heavy reads.
Non-essential webhooks.

Diagnose

matter ops shed-status

Identify the trigger:

Real load spike — surge mode also engaged (P0.E15).
Slow query — one slow op slowing the whole pool.
Provider-side dependency lagging — provider failover may apply.
Memory pressure — leak; bounce affected pods.

Recover

Address the underlying cause; load-shed exits automatically once queue depth + p99 drop below threshold for 5 minutes.

For manual exit:

matter ops shed-disengage --reason "<incident_id>"

(Use sparingly; auto-exit is the safe path.)

Validate

matter ops verify-load-shed-disengaged

Expected:

Queue depth < 50% of threshold.
p99 < SLO.
No 503s for the last 10 minutes.

Post-recovery

Tune the trigger threshold if it engaged spuriously.
Tune the per-op "critical vs non-critical" classification if a real-critical-op got shed.

Last updated Jun 14, 2026

Load shed engagement (SEV3)

Triggered by: queue depth > threshold OR p99 > 2× SLO sustained for ≥ 5 minutes. apps/api/lib/middleware/load-shed.ts engages, returning 503 + Retry-After on non-critical requests.

On-call: Platform.

What load-shed preserves vs drops

Preserved (always):

Writes (any mutation).
Audit endpoints.
Dissolution path.
Webhook delivery.

Dropped under shed:

List endpoints (read).
Reports + expand-heavy reads.
Non-essential webhooks.

Diagnose

matter ops shed-status

Identify the trigger:

Real load spike — surge mode also engaged (P0.E15).
Slow query — one slow op slowing the whole pool.
Provider-side dependency lagging — provider failover may apply.
Memory pressure — leak; bounce affected pods.

Recover

Address the underlying cause; load-shed exits automatically once queue depth + p99 drop below threshold for 5 minutes.

For manual exit:

matter ops shed-disengage --reason "<incident_id>"

(Use sparingly; auto-exit is the safe path.)

Validate

matter ops verify-load-shed-disengaged

Expected:

Queue depth < 50% of threshold.
p99 < SLO.
No 503s for the last 10 minutes.

Post-recovery

Tune the trigger threshold if it engaged spuriously.
Tune the per-op "critical vs non-critical" classification if a real-critical-op got shed.

Load shed engagement

Load shed engagement (SEV3)

What load-shed preserves vs drops

Diagnose

Recover

Validate

Post-recovery

On this page

Load shed engagement

Load shed engagement (SEV3)

What load-shed preserves vs drops

Diagnose

Recover

Validate

Post-recovery

On this page