I wrote the book on developer marketing. Literally. Picks and Shovels hit #1 on Amazon.

Engineering and DevOpsSLO

SLO

ess-ell-oh

Service level objective: an internal reliability target for a service, like 99.9% availability or p99 latency under 200ms.

An SLO (service level objective) is an internal target your team sets for a specific reliability metric. It answers the question: "How reliable does this service need to be?" Not perfectly reliable. Reliably enough for users and the business.

SLOs sit between SLIs and SLAs. Your SLI measures something (request latency, error rate). Your SLO sets a target for that measurement (p99 latency under 200ms). Your SLA makes an external promise to customers based on that target. Google popularized the practice through its SRE book, and most infrastructure teams now define SLOs for every production service.

The key insight behind SLOs is the concept of an error budget. If your SLO is 99.9% availability, you have a 0.1% error budget per month, roughly 43 minutes of allowed downtime. When the budget is healthy, teams ship fast. When it is nearly spent, teams slow down and focus on reliability. This creates a natural balance between velocity and stability without arguments about priorities.

Examples

A platform team defines SLOs for their API gateway.

They set three SLOs: 99.95% availability, p50 latency under 50ms, and p99 latency under 300ms. They measure these over a rolling 30-day window. Dashboards show real-time error budget consumption. When the availability budget drops below 30%, the team freezes non-critical deployments.

An engineering team uses error budgets to balance speed and reliability.

The payments service has a 99.99% availability SLO. In January, two incidents consume 60% of the error budget by mid-month. The team pauses feature work for two weeks, fixes the root causes, and finishes the month at 99.985%. February starts with a full budget and the team resumes shipping.

A product manager questions the SLO target.

The internal admin dashboard has a 99.99% SLO, the same as the customer-facing API. An engineer points out that 10 internal users do not need the same reliability as 50,000 customers. The team drops the dashboard SLO to 99.5%, freeing engineering time for services that matter more.

In practice

Frequently asked questions

How do you choose the right SLO target?

Start with what users actually experience. Measure your current performance over 30 days, then set a target slightly below that. A service running at 99.95% might get a 99.9% SLO. The target should be ambitious enough to matter but achievable enough to maintain. Tighten it over time as reliability improves.

What happens when you miss an SLO?

Unlike SLAs, missing an SLO has no contractual penalties. But it should trigger action. Most teams freeze feature deployments, conduct a review of recent incidents, and focus engineering effort on reliability until the error budget recovers. The SLO is a forcing function for prioritization, not a legal obligation.

Related terms

SLISLI

Service level indicator: a specific metric used to measure the reliability of a service, like latency or error rate.

SLASLA

Service level agreement: a contractual commitment to specific performance and availability levels.

Error rate

The percentage of requests that fail compared to total requests, usually measured over a rolling time window.

AvailabilityUptime

The percentage of time a system is operational and accessible to users.

Incident

An unplanned event that disrupts a service or degrades it below its expected quality, requiring a coordinated response.

Picks and Shovels: Marketing to Developers During the AI Gold Rush

Want the complete playbook?

Picks and Shovels is the definitive guide to developer marketing. Amazon #1 bestseller with practical strategies from 30 years of marketing to developers.

Get your copy Browse the FAQ