Severity levels

Severity levels (sev levels) classify incidents by how bad they are. Most companies use a four-tier system: SEV-1 is catastrophic (total outage), SEV-2 is major (significant feature broken), SEV-3 is moderate (minor feature impacted), and SEV-4 is low (cosmetic or edge case). The severity determines who gets paged, how fast you respond, and how many resources you throw at the problem.

Clear severity definitions prevent arguments during incidents. "Is this a SEV-1 or SEV-2?" is a waste of time when customers are down. Good definitions are objective: "SEV-1: more than 50% of users cannot complete core workflows" leaves little room for debate. Bad definitions are subjective: "SEV-1: something really bad happened."

Severity levels also drive process. A SEV-1 might require an incident commander, a dedicated Slack channel, executive notification, and a postmortem within 48 hours. A SEV-4 might just need a Jira ticket and a fix in the next sprint. Without severity levels, every incident gets the same response, which means either under-reacting to real outages or over-reacting to minor issues.

Examples

A company defines its severity levels.

SEV-1: complete outage or data loss affecting all users. Response: page everyone, all hands on deck, 15-minute status updates. SEV-2: major feature broken for a significant subset. Response: page on-call team, dedicated incident channel. SEV-3: minor feature degraded. Response: next business day. SEV-4: cosmetic issue. Response: fix in next sprint.

An on-call engineer triages an alert.

An alert fires: 'Login failure rate above 10%.' The engineer checks scope. It only affects users authenticating with SAML SSO, roughly 5% of total logins. Core username/password login works fine. They classify it as SEV-2 (major feature broken for a subset), page the identity team, and begin investigation.

A team reviews incident severity trends.

Over six months, the team logged 3 SEV-1s, 8 SEV-2s, 15 SEV-3s, and 22 SEV-4s. All three SEV-1s involved the same payment processing pipeline. The VP of Engineering approves a reliability project specifically targeting that pipeline. Severity data turned an opinion ('payments feels fragile') into a funded project.

Frequently asked questions

What is the difference between severity and priority?

Severity measures impact: how bad is the problem? Priority measures urgency: how soon should we fix it? A SEV-1 incident (total outage) is always high priority. But a SEV-3 issue (minor bug) might become high priority if it affects a key customer demo happening tomorrow. Most teams use severity for incidents and priority for bugs in their backlog.

Who decides the severity level?

The on-call engineer or incident commander makes the initial call based on predefined criteria. They can escalate or downgrade as more information emerges. The key is having objective definitions so the decision is fast and defensible. Debating severity during an active incident wastes time that should be spent fixing the problem.

Examples

In practice

Read more on the blog

Frequently asked questions

What is the difference between severity and priority?

Who decides the severity level?

Related terms

Want the complete playbook?