SLA
ess-ell-ay
Service level agreement: a contractual commitment to specific performance and availability levels.
An SLA (service level agreement) is a contractual promise between a service provider and a customer specifying performance guarantees. It defines what the provider commits to (availability, latency, support response times) and what happens when they fail (service credits, refunds).
SLAs have teeth. If AWS promises 99.99% availability on S3 and delivers 99.95%, customers can claim service credits. This financial incentive motivates providers to invest in reliability. It also motivates customers to choose providers whose SLAs match their needs.
SLAs should be more conservative than what you actually deliver. If your system runs at 99.95% availability, your SLA might promise 99.9%. The buffer protects you from exceptional situations (major incidents, natural disasters) without triggering contractual penalties. Never set an SLA you cannot consistently exceed. Use SLOs internally and SLIs to measure whether you are meeting them.
Examples
An enterprise customer negotiates an SLA.
The customer requires 99.99% availability with 15-minute response time for critical incidents. The provider agrees and includes a penalty clause: 10% service credit for each 0.01% below the target in a given month. Both parties understand the commitment and the consequences.
An SLA violation triggers service credits.
A cloud provider has a 4-hour outage affecting one region. Monthly availability drops to 99.4%, below the 99.9% SLA. Affected customers are eligible for a 25% service credit on their monthly bill. The provider spends $15M in credits and $5M on engineering to prevent recurrence.
A startup sets its first SLA.
The startup's system runs at 99.95% availability over the past year. They set their SLA at 99.5% to give themselves room. Enterprise customers ask for 99.9%. The startup agrees for customers on the Enterprise plan, with a 5% monthly credit for violations.
In practice
Read more on the blog
Frequently asked questions
What is the difference between an SLA, SLO, and SLI?
An SLI (service level indicator) is a metric you measure (like p99 latency). An SLO (service level objective) is an internal target for that metric (p99 under 200ms). An SLA is an external contractual commitment with penalties for violation (99.9% availability or service credits). SLI measures, SLO targets, SLA promises.
Should every service have an SLA?
Every service should have SLOs (internal targets). Only services with paying customers typically need SLAs (external commitments with penalties). Internal tools, staging environments, and free-tier services usually do not have SLAs.
Related terms
Service level objective: an internal reliability target for a service, like 99.9% availability or p99 latency under 200ms.
Service level indicator: a specific metric used to measure the reliability of a service, like latency or error rate.
The percentage of time a system is operational and accessible to users.
An unplanned event that disrupts a service or degrades it below its expected quality, requiring a coordinated response.

Want the complete playbook?
Picks and Shovels is the definitive guide to developer marketing. Amazon #1 bestseller with practical strategies from 30 years of marketing to developers.