Last updated Jun 14, 2026

Audit chain recovery (SEV1)

Triggered by: data-integrity-check cron (P0.G19) reports a broken link in the per-(org, mode) audit chain, OR a customer's verification endpoint returns "chain inconsistent."

On-call: Security (primary), Platform (secondary). Pager: SEV1, page within 60 seconds. Estimated MTTR: 30-90 minutes for synthetic anomaly, 4+ hours for real corruption.

Stop-the-bleed (5 minutes)

Freeze writes to the affected (org, mode):

matter ops freeze-org --org-id <org_id> --mode <mode> --reason "audit-chain-break investigation"

Pin a snapshot of the AuditEntry table for forensics:

matter ops snapshot-audit --org-id <org_id> --mode <mode> --target /forensics/<incident_id>/

Cross-check Rekor anchor (P0.C4): every AuditEntry row from the last 24 hours should have a corresponding Rekor transparency log entry. Compare digests.

Diagnose

Three failure modes:

Mode A: Stale read

The CDC consumer lagged; the verification endpoint read pre-write. Mitigation: bounce the CDC consumer; rerun the check. No data recovery needed.

Mode B: Genuine break (extremely rare)

A row was inserted out of band. Possible causes:

Manual SQL outside packages/database/src/append-only.ts.
Replica corruption.
Storage-tier migration (P11.13) didn't preserve the chain.

Action: locate the bad row. Compare adjacent prevHash vs current hash. Use forensics snapshot — DO NOT mutate live data yet.

Mode C: Adversarial tampering

The chain was rewritten by someone with write access.

Action: immediately rotate the audit chain pepper (P0.C5) + invalidate sessions + page Security leadership. This is a breach-level event.

Recover

For Modes A + B, recover from the canonical event log (P0.E5 event-sourcing-as-canonical):

matter ops rebuild-audit-chain \
  --org-id <org_id> \
  --mode <mode> \
  --from-sequence <break_point> \
  --verify-against-rekor

This walks the event log forward from the break point, recomputes hashes, and writes a new chain segment to the WORM bucket.

For Mode C, restore from the most recent verified Rekor-anchored state + replay events. The replayFromSnapshot path in apps/api/lib/cqrs-read-model.ts is the canonical implementation.

Validate

Run the canonical post-recovery suite:

matter ops verify-audit-chain --org-id <org_id> --mode <mode> --since 24h

Expected outputs:

Every AuditEntry verifies against the recomputed prevHash.
Every entry has a Rekor anchor within SLA window.
The data-integrity-check cron's next run reports clean.

Communicate

Per severity-matrix.mdx, SEV1 requires:

Status-page update within 15 minutes of declaration.
Customer email to the affected org within 1 hour.
Internal #incidents channel updated every 30 minutes.
Postmortem within 5 days, externally published within 35 days.

Post-recovery action items

Standard template:

Identify which control allowed the break.
Add a regression test to apps/api/__tests__/audit.test.ts.
Run the chaos drill on this scenario within 2 weeks to verify the regression doesn't recur.

Last updated Jun 14, 2026

Audit chain recovery (SEV1)

Triggered by: data-integrity-check cron (P0.G19) reports a broken link in the per-(org, mode) audit chain, OR a customer's verification endpoint returns "chain inconsistent."

On-call: Security (primary), Platform (secondary). Pager: SEV1, page within 60 seconds. Estimated MTTR: 30-90 minutes for synthetic anomaly, 4+ hours for real corruption.

Stop-the-bleed (5 minutes)

Freeze writes to the affected (org, mode):

matter ops freeze-org --org-id <org_id> --mode <mode> --reason "audit-chain-break investigation"

Pin a snapshot of the AuditEntry table for forensics:

matter ops snapshot-audit --org-id <org_id> --mode <mode> --target /forensics/<incident_id>/

Cross-check Rekor anchor (P0.C4): every AuditEntry row from the last 24 hours should have a corresponding Rekor transparency log entry. Compare digests.

Diagnose

Three failure modes:

Mode A: Stale read

The CDC consumer lagged; the verification endpoint read pre-write. Mitigation: bounce the CDC consumer; rerun the check. No data recovery needed.

Mode B: Genuine break (extremely rare)

A row was inserted out of band. Possible causes:

Manual SQL outside packages/database/src/append-only.ts.
Replica corruption.
Storage-tier migration (P11.13) didn't preserve the chain.

Action: locate the bad row. Compare adjacent prevHash vs current hash. Use forensics snapshot — DO NOT mutate live data yet.

Mode C: Adversarial tampering

The chain was rewritten by someone with write access.

Action: immediately rotate the audit chain pepper (P0.C5) + invalidate sessions + page Security leadership. This is a breach-level event.

Recover

For Modes A + B, recover from the canonical event log (P0.E5 event-sourcing-as-canonical):

matter ops rebuild-audit-chain \
  --org-id <org_id> \
  --mode <mode> \
  --from-sequence <break_point> \
  --verify-against-rekor

This walks the event log forward from the break point, recomputes hashes, and writes a new chain segment to the WORM bucket.

For Mode C, restore from the most recent verified Rekor-anchored state + replay events. The replayFromSnapshot path in apps/api/lib/cqrs-read-model.ts is the canonical implementation.

Validate

Run the canonical post-recovery suite:

matter ops verify-audit-chain --org-id <org_id> --mode <mode> --since 24h

Expected outputs:

Every AuditEntry verifies against the recomputed prevHash.
Every entry has a Rekor anchor within SLA window.
The data-integrity-check cron's next run reports clean.

Communicate

Per severity-matrix.mdx, SEV1 requires:

Status-page update within 15 minutes of declaration.
Customer email to the affected org within 1 hour.
Internal #incidents channel updated every 30 minutes.
Postmortem within 5 days, externally published within 35 days.

Post-recovery action items

Standard template:

Identify which control allowed the break.
Add a regression test to apps/api/__tests__/audit.test.ts.
Run the chaos drill on this scenario within 2 weeks to verify the regression doesn't recur.

Audit chain recovery

Audit chain recovery (SEV1)

Stop-the-bleed (5 minutes)

Diagnose

Mode A: Stale read

Mode B: Genuine break (extremely rare)

Mode C: Adversarial tampering

Recover

Validate

Communicate

Post-recovery action items

On this page

Audit chain recovery

Audit chain recovery (SEV1)

Stop-the-bleed (5 minutes)

Diagnose

Mode A: Stale read

Mode B: Genuine break (extremely rare)

Mode C: Adversarial tampering

Recover

Validate

Communicate

Post-recovery action items

On this page