On-call handover

Context for the next on-call rotation: open incidents, hot systems, deferred work, watch-outs.

500 wordsOn-callHandoverRotation

On-call Handover — YYYY-MM-DD

Outgoing on-call: Name Incoming on-call: Name Handover period: YYYY-MM-DD HH:MM UTC → YYYY-MM-DD HH:MM UTC Time zone note: Any time zone considerations for the incoming person

1. Rotation Details

Field	Detail
Outgoing on-call	Name
Incoming on-call	Name
Shift start (outgoing)	YYYY-MM-DD HH:MM UTC
Shift end (outgoing)	YYYY-MM-DD HH:MM UTC
Backup on-call (incoming shift)	Name

2. Open Incidents (Active or Recently Closed)

Incident	Severity	Status	Owner	Next action	Link
Incident title or ID	SEV-1/2/3	Active / Monitoring / Closed	Name	Describe what needs to happen next	Link

If there are no open incidents, state: "No open incidents at handover."

3. Recent Deploys (Last 48 h)

Service	Version	Deployed by	Deployed at	Notes / watch-outs
Service name	v0.0	Name	YYYY-MM-DD HH:MM UTC	Anything worth monitoring

If no deploys in the last 48 h, state: "No deploys in the last 48 h."

4. Hot Systems / Watch-outs

System	Why it's hot	What to watch for	Mitigation if it goes wrong
System name	Brief context	Specific metric or symptom	Mitigation step or runbook link

5. Deferred Work

Things that came up during your shift that you did not action — context for the incoming on-call.

Item: brief description and why it was deferred
Item: brief description and why it was deferred

If nothing deferred, state: "Nothing deferred."

6. Useful Links

Resource	Link
Monitoring dashboard	URL
Alerting console	URL
Incident runbook	URL
Status page admin	URL
On-call calendar	URL
Escalation contacts	URL or list

7. Sign-off

Outgoing on-call: Name — YYYY-MM-DD HH:MM UTC Incoming on-call confirmed receipt: Name — YYYY-MM-DD HH:MM UTC

Anything else worth noting before you hand over:

On-call Handover — 2024-05-17 (Friday evening → Monday morning)

Outgoing on-call: Jordan Osei Incoming on-call: Priya Mehta Handover period: 2024-05-17 18:00 UTC → 2024-05-20 09:00 UTC Time zone note: Jordan is in London (BST = UTC+1). Priya is in Berlin (CEST = UTC+2). All times in this document are UTC.

1. Rotation Details

Field	Detail
Outgoing on-call	Jordan Osei
Incoming on-call	Priya Mehta
Shift start (outgoing)	2024-05-13 09:00 UTC
Shift end (outgoing)	2024-05-17 18:00 UTC
Backup on-call (incoming shift)	Sam Reid (DBA), Marcus Webb (Security)

2. Open Incidents (Active or Recently Closed)

Incident	Severity	Status	Owner	Next action	Link
INC-2024-047 Payment processing degradation	SEV-2	Closed — monitoring	Jordan Osei	Monitor `CheckoutErrorRate` and `DatabaseConnections` over the weekend. If error rate rises above 0.5%, follow the DB connection pool runbook section in `#inc-2024-05-17-payments-degraded`. Fatima (Engineering Director) is aware and expects an update Monday morning.	Notion/Incidents/INC-2024-047

Context on INC-2024-047: Checkout error rate hit 2.3% on Friday afternoon due to RDS connection pool exhaustion introduced by the v2.4 async payment orchestration layer. We fixed it by increasing max_connections to 200. The RCA is scheduled for Monday 2024-05-20 at 10:00 UTC — I'll be on that call. You do not need to prepare anything for it unless something changes over the weekend.

3. Recent Deploys (Last 48 h)

Service	Version	Deployed by	Deployed at	Notes / watch-outs
Payments API	v2.4.1 (hotfix)	Sam Reid	2024-05-17 15:10 UTC	Increased `max_connections` to 200. Low-risk config change — required RDS instance restart (< 30 s downtime, no alerts fired).
Notification Service	v1.9.3	Dev Patel	2024-05-16 14:22 UTC	Routine dependency bump. No issues observed in the 28 h since deploy.

4. Hot Systems / Watch-outs

System	Why it's hot	What to watch for	Mitigation if it goes wrong
Payments API — RDS connection pool	Just recovered from INC-2024-047. The fix is live but the root cause (async connection lifecycle) has not been addressed in code yet.	`DatabaseConnections` metric in Datadog → alert fires at 160/200 (80%). If you see it climbing steadily, investigate immediately.	Runbook: `#inc-2024-05-17-payments-degraded` (pinned). Emergency: increase `max_connections` again to 300 (Sam Reid has the RDS credentials and knows the process).
Search Service	Elasticseach cluster has been running at 70% disk usage for two weeks. Ticket INFRA-892 is open. Not urgent but could become SEV-3 if disk fills.	Datadog → Elasticsearch dashboard → `Disk Usage %`. Alert fires at 85%.	Contact Dev Patel — he owns INFRA-892. Do not delete indices without checking with him first.

5. Deferred Work

INFRA-891 — Add DatabaseConnections alert: I created the Datadog monitor but it is not yet saved (I ran out of time). The monitor is in draft at Datadog/Monitors/Drafts — "DB Connections 80% ceiling". Please save and activate it on Monday morning, or ask Sam Reid to do so. This is listed as a P1 action item in the INC-2024-047 RCA.
PagerDuty schedule audit — Fatima asked us to verify the on-call rotation is correct for June. I have not done this yet. Low urgency — can wait until next week. The ask is in #platform-oncall (search "rotation June").

6. Useful Links

Resource	Link
Monitoring dashboard	grafana.acmecorp.internal/d/platform-overview
Alerting console	app.datadoghq.com/monitors/manage
Incident runbook	Notion/Runbooks/Incident-Response
Status page admin	manage.statuspage.io (credentials in 1Password → "StatusPage Admin")
On-call calendar	PagerDuty → Schedules → "Platform On-call"
Escalation contacts	Notion/Runbooks/Escalation-Contacts

7. Sign-off

Outgoing on-call: Jordan Osei — 2024-05-17 17:55 UTC

Incoming on-call confirmed receipt: Priya Mehta — 2024-05-17 18:03 UTC

Jordan's parting note: The weekend is likely to be quiet — no planned deploys, no sales events. The only thing I am actively watching is the INC-2024-047 recovery. If anything comes up, Sam Reid is a great first call for anything infrastructure-related. Have a good shift.

// Related templates

Incident response runbook

On-call playbook: severity ladder, triage flow, comms templates, escalation paths.

Root cause analysis

5 Whys, fishbone, timeline of events, contributing factors, action items.

Sprint retrospective

What went well, what didn't, action items. QA-flavoured retro template for sprint reviews.