Runbooks
Incident communications template
How Matter communicates during operational incidents. Exact wording, cadence, and channels per severity. Published for enterprise transparency.
Last updated
Every Matter incident at SEV1 or SEV2 carries a public communications obligation: status page within 15–30 minutes of declaration, customer email within 30–60 minutes, repeated updates on a fixed cadence until resolution. SEV3 carries an internal-only status-page entry. SEV4 carries no comms.
The template below is the canonical wording. Variants are allowed only when the incident type genuinely calls for it.
Channels
| Channel | Purpose | Audience |
|---|---|---|
Status page (status.mattermode.com) | Authoritative source of truth. Every public update lands here first. | All customers + the public. |
| Email (transactional) | Direct notification to customers in the affected scope. | Active customers whose tokens touched affected endpoints in the last 24h. |
| Slack notification (where customer has connected the dashboard) | Real-time signal to engineering teams using Matter. | Active customers who have opted in. |
Internal #incident-<yyyymmdd>-<short-name> channel | Real-time engineering coordination. | Internal only. |
| Engineering all-hands | SEV1 sustained > 30 min or audit-chain integrity failure. | Internal only. |
Status page template (SEV1)
Initial update (within 15 minutes of declaration):
Investigating — <Component> degraded
We are investigating reports of <user-visible symptom> affecting <scope: all customers / EU customers / etc.>. Customers may experience <concrete impact>. We have engaged on-call and engineering leadership. Next update in 30 minutes.
Subsequent updates (every 30 minutes):
Identified — <root cause statement>
We have identified the root cause as <plain description, no jargon>. <What we are doing>. Customer impact is <description>. <If known, ETA; otherwise: We will provide an updated ETA in the next status update.>
Resolution:
Resolved — <Component> restored
The incident has been fully resolved as of <UTC timestamp>. <Brief summary of root cause and fix>. A full post-mortem will be published within 14 days at
https://docs.mattermode.com/postmortems/<slug>.
Email template (SEV1)
Subject: Matter API incident in progress
Hi <name>,
We are currently investigating a Matter API incident that may be affecting your integration. The latest details are on our status page: https://status.mattermode.com.
What you might be seeing: <user-visible symptom>.
What we are doing: <plain description of the response>.
Affected scope: <which endpoints / regions / customer scope>.
We will email a resolution notice as soon as the incident is closed. Your account ID for reference is <id>. If you need to talk to a person, reply to this email or page us at https://status.mattermode.com/contact.
— Matter on-call
Status page template (SEV2)
Initial update (within 30 minutes of declaration):
Monitoring — <Component> degraded for <scope>
We are monitoring degraded performance on <component> for <scope: a specific region / a specific feature / a specific endpoint>. <Concrete user impact>. We have a mitigation in progress. Next update in 60 minutes.
Subsequent updates (every 60 minutes):
Identified / Mitigation in progress — <root cause>
We have <identified root cause / deployed mitigation / failed over to backup provider>. Current customer impact: <description>. Next update in 60 minutes or at resolution, whichever is sooner.
Resolution:
Resolved — <Component> restored
The degradation has been fully resolved as of <UTC timestamp>. <Brief summary>. A post-mortem will be published within 21 days at
https://docs.mattermode.com/postmortems/<slug>.
Email template (SEV2)
Subject: Matter API degraded for <component> — monitoring
Hi <name>,
We are monitoring degraded performance on <component> that may affect your integration. The full status update is at https://status.mattermode.com.
What you might be seeing: <user-visible symptom>.
Affected scope: <which endpoints / regions>.
If your traffic is in an unaffected scope, no action is required. If you are affected, retries with exponential backoff will resolve transient failures as we restore service.
— Matter on-call
Status page template (SEV3, internal-only)
Internal status page entry only. No public update. No customer email.
Internal — <SLO> breached on <component>
p99 latency on <operation> exceeds budget. Investigating. No customer-visible impact. Ticket: <link>.
Post-mortem template
Lives at apps/docs/content/docs/runbooks/postmortem-template.mdx (lands P11.19). The skeleton:
- Summary: one paragraph, no jargon.
- Customer impact: scope, duration, observed symptoms.
- Timeline: UTC timestamps of declaration, first mitigation, full resolution.
- Root cause: technical narrative.
- What went well, what went wrong: unflinching.
- Action items: time-bound owners, tracked to closure.
SEV1 and SEV2 post-mortems are published externally at https://docs.mattermode.com/postmortems/<slug> 30 days after resolution. SEV3 is internal-only at engineering review.
What we will not do
- No PR-speak. We do not say "performance was suboptimal" when we mean "5 % of requests returned 500 for 12 minutes." We do not say "limited number of users affected" when we mean "everyone in EU-Central for 8 minutes."
- No silent severity downgrades. A SEV1 stays SEV1 in the post-mortem record. If we got it wrong initially, we say so.
- No deferred customer notification on SEV1. Even if the cause turns out to be small, the page-storm and engineering attention warrant the customer-comms cost.
See also
- Severity matrix — how an incident gets classified.
- Customer SLA — credits earned by an outage.
- Matter API SLOs — internal targets.