Runbooks
Tenant isolation breach
SEV1 runbook for any cross-tenant data exposure or access path.
Last updated
Tenant isolation breach (SEV1)
Triggered by: red-team harness flags cross-tenant access, OR a customer reports seeing another customer's data, OR an audit-trail review reveals cross-(org, mode) access.
On-call: Security (primary), Platform (secondary). Pager: SEV1, page within 60 seconds. Estimated MTTR: 2-8 hours.
Stop-the-bleed
- Freeze the affected access path:
matter ops freeze-operation --op <operation_id> --reason "tenant isolation investigation" - Capture forensics snapshot:
matter ops snapshot-org --org-id <affected_org> --target /forensics/<incident>/ - Identify scope of exposure: Audit-on-read (P0.B6) provides the exhaustive list of cross-tenant reads.
Diagnose
Three failure modes:
Mode A: ORM extension bypassed
Raw SQL outside packages/database/src/ (banned by P0.B4 — should
never happen, but ESLint exceptions exist). Audit git log for
recent escape hatches.
Mode B: Mode segregation broken
Live token reading sandbox data, or vice versa. Caused by mode
derivation failing in middleware. Check apps/api/lib/mode.ts.
Mode C: Scope DSL evaluation bug
Token scope policy evaluation returned wrong verdict. Check
packages/auth-api-key/src/scope-policy.ts golden-tests.
Recover
The exposed data cannot be unexposed; the goal is to:
- Patch the access path with a regression test.
- Notify the affected customer(s) per breach-notification SLA (24h max).
- Force-rotate all tokens that may have observed cross-tenant
data:
matter ops force-rotate-tokens --filter "issued_before:<incident_start>" --reason "tenant isolation incident" - Recompute the impact: which customers' data was exposed to whom, for how long.
Communicate
Per breach-notification SLA + regulatory requirements (GDPR Article 33: 72h; state laws varies):
- Customer email to affected within 24 hours.
- Regulator notification per applicable law.
- Status page update only if exposure scope warrants (rarely; most isolation breaches are scoped).
- Public postmortem within 35 days (SEV1).
Post-recovery action items
- Regression test added to
apps/api/__tests__/tenant-isolation/. - Red-team harness extended to catch this class of issue.
- Threat model updated with the new attack surface.
- SOC 2 + ISO 27001 evidence captured for the audit trail.