Process
On-call handover
Matter's weekly on-call handover protocol. Every Monday at 10:00 UTC the outgoing primary briefs the incoming primary on open incidents, follow-ups, and watch-this-space items.
Last updated
Every Monday at 10:00 UTC the outgoing on-call primary briefs the incoming on-call primary. The handover is the most expensive 15 minutes of the week — it is also the cheapest insurance against an incident in week N+1 that was a known issue in week N.
This document is the protocol. It is reviewed quarterly by the engineering lead.
Format
Synchronous video call. 15 minutes is the target; 30 minutes is the ceiling. Recorded for asynchronous catch-up if the incoming primary is unavailable at the scheduled time.
Attendees.
- Outgoing primary on-call.
- Incoming primary on-call.
- Outgoing secondary (optional; encouraged).
- Incoming secondary (optional; encouraged).
- Engineering lead (optional; attends every other handover).
Output. Handover note committed to apps/docs/internal/handovers/<yyyy-mm-dd>-week-<n>.mdx by the outgoing primary before the call ends. Future on-call rotations can read past handovers to spot trends.
Agenda
1. Open incidents (5 minutes)
Outgoing primary walks through:
- Any unresolved SEV1 / SEV2 from the week. Status. Next action. Who is doing it.
- Any unresolved SEV3 from the week. Triage status.
- Any open follow-up actions from earlier post-mortems with deadlines in the coming week.
2. Watch-this-space (5 minutes)
Outgoing primary names:
- Any alerts that fired this week with no real impact (false-positive watch).
- Any leading indicator approaching its threshold (capacity-plan awareness).
- Any deploy expected during the incoming week that warrants extra attention.
- Any customer-side concern raised that did not become an incident but should be tracked.
- Any chaos-test failure that did not result in a real-impact event but exposed a fragility.
3. Personnel and availability (2 minutes)
Outgoing primary confirms:
- Who else on the team is unavailable this week (vacations, leave).
- Whether the secondary is available.
- Whether the manager on-call has any blackout windows.
4. Questions (3 minutes)
Incoming primary asks anything unclear. Outgoing primary answers, or commits to follow up asynchronously.
What the handover note looks like
Template at apps/docs/internal/handovers/_template.mdx:
---
title: "Week N handover — YYYY-MM-DD"
outgoing: <name>
incoming: <name>
---
## Open incidents
- [SEV3] _short title_. Status: _description_. Next: _action_ by _person_ by _date_. Ticket: <link>.
## Watch this space
- _Concern_. Why it might escalate: _description_. Recommended action: _description_.
## Personnel
- _Person A_ out _dates_.
- _Person B_ on conference _dates_; reachable async only.
## Notes from outgoing
_Any tribal knowledge, vibes, gut-feel concerns the incoming primary should know._Asynchronous handover (when sync is impossible)
If the incoming primary cannot attend the live handover (timezone, leave, illness):
- Outgoing primary records the live call (or records a 15-minute video walkthrough of the handover note).
- The recording is posted in the on-call channel.
- Incoming primary watches the recording within 4 hours of taking the rotation.
- Incoming primary posts an "ack" in the on-call channel + their first questions.
- Outgoing primary remains available asynchronously for 24 hours to answer follow-up questions.
What the handover prevents
Three failure modes:
- Stale-issue surprise. A SEV3 that was triaged on Friday but not closed is rediscovered on Wednesday as a SEV2. The handover catches it.
- Leading-indicator regression. A leading indicator (replica lag, queue depth, growth rate) approaches its alert threshold in week N; the alert fires in week N+1 to an on-call who has no context. The handover sets context.
- Customer-trust gap. A customer raised a concern in week N; week N+1's on-call has no awareness; the customer asks again and gets a different answer. The handover bridges.
How the handover changes during an active SEV1 or SEV2
If a SEV1 or SEV2 is in flight at the handover boundary:
- No handover. The outgoing primary stays in the incident until it resolves or until a manual transfer is explicitly negotiated.
- The incoming primary assumes responsibility for non-incident on-call (SEV3 + SEV4) while the outgoing primary continues the incident.
- The handover happens after the incident resolves, with the post-mortem date as a parallel item.
The protocol explicitly does not transfer SEV1 / SEV2 ownership at a scheduled boundary. Continuity matters more than schedule adherence.
Handover when the primary departs
When an engineer permanently leaves the on-call rotation:
- They author a comprehensive handover doc at
apps/docs/internal/handovers/departure-<name>-<yyyy-mm-dd>.mdxcovering: known issues, in-flight work, undocumented tribal knowledge, relationships with external vendors / providers, vendor-account credentials in the team's vault. - They are available for questions for 30 days.
- The CTO acknowledges receipt of the handover doc.
This is part of the ownership matrix review.
See also
- Ownership matrix — who's in the rotation.
- Severity matrix — what each severity triggers.
- Incident communications — customer-facing templates.