Severity levels
sev LEV-ulz
A classification system (SEV-1 through SEV-4) that ranks incidents by impact and urgency to determine response priority.
Severity levels (sev levels) classify incidents by how bad they are. Most companies use a four-tier system: SEV-1 is catastrophic (total outage), SEV-2 is major (significant feature broken), SEV-3 is moderate (minor feature impacted), and SEV-4 is low (cosmetic or edge case). The severity determines who gets paged, how fast you respond, and how many resources you throw at the problem.
Clear severity definitions prevent arguments during incidents. "Is this a SEV-1 or SEV-2?" is a waste of time when customers are down. Good definitions are objective: "SEV-1: more than 50% of users cannot complete core workflows" leaves little room for debate. Bad definitions are subjective: "SEV-1: something really bad happened."
Severity levels also drive process. A SEV-1 might require an incident commander, a dedicated Slack channel, executive notification, and a postmortem within 48 hours. A SEV-4 might just need a Jira ticket and a fix in the next sprint. Without severity levels, every incident gets the same response, which means either under-reacting to real outages or over-reacting to minor issues.
Examples
A company defines its severity levels.
SEV-1: complete outage or data loss affecting all users. Response: page everyone, all hands on deck, 15-minute status updates. SEV-2: major feature broken for a significant subset. Response: page on-call team, dedicated incident channel. SEV-3: minor feature degraded. Response: next business day. SEV-4: cosmetic issue. Response: fix in next sprint.
An on-call engineer triages an alert.
An alert fires: 'Login failure rate above 10%.' The engineer checks scope. It only affects users authenticating with SAML SSO, roughly 5% of total logins. Core username/password login works fine. They classify it as SEV-2 (major feature broken for a subset), page the identity team, and begin investigation.
A team reviews incident severity trends.
Over six months, the team logged 3 SEV-1s, 8 SEV-2s, 15 SEV-3s, and 22 SEV-4s. All three SEV-1s involved the same payment processing pipeline. The VP of Engineering approves a reliability project specifically targeting that pipeline. Severity data turned an opinion ('payments feels fragile') into a funded project.
In practice
Read more on the blog
Frequently asked questions
What is the difference between severity and priority?
Severity measures impact: how bad is the problem? Priority measures urgency: how soon should we fix it? A SEV-1 incident (total outage) is always high priority. But a SEV-3 issue (minor bug) might become high priority if it affects a key customer demo happening tomorrow. Most teams use severity for incidents and priority for bugs in their backlog.
Who decides the severity level?
The on-call engineer or incident commander makes the initial call based on predefined criteria. They can escalate or downgrade as more information emerges. The key is having objective definitions so the decision is fast and defensible. Debating severity during an active incident wastes time that should be spent fixing the problem.
Related terms
An unplanned event that disrupts a service or degrades it below its expected quality, requiring a coordinated response.
A written analysis of an incident: what happened, why, and what the team will do to prevent it from recurring.
A rotation where engineers are responsible for responding to production alerts and incidents outside business hours.
Service level objective: an internal reliability target for a service, like 99.9% availability or p99 latency under 200ms.
The percentage of time a system is operational and accessible to users.

Want the complete playbook?
Picks and Shovels is the definitive guide to developer marketing. Amazon #1 bestseller with practical strategies from 30 years of marketing to developers.