Uptime and downtime
UP-time / DOWN-time
Uptime is when a service is working and available to users. Downtime is when it is not.
Uptime is the amount of time your service is operational and accessible. Downtime is the amount of time it is not. These are the most basic measures of reliability, and they drive real business outcomes. Every minute of downtime on Amazon.com costs an estimated $220,000 in lost sales. For a startup, downtime might mean losing the deal your sales team is about to close.
Uptime is usually expressed as a percentage over a period. "99.9% uptime" means the service can be down for a maximum of 43 minutes per month. "99.99% uptime" means 4.3 minutes per month. Each additional nine is exponentially harder to achieve. Going from 99% to 99.9% might mean adding redundancy. Going from 99.9% to 99.99% might mean redesigning the architecture. Going from 99.99% to 99.999% might mean multi-region failover with no single points of failure.
Downtime can be planned or unplanned. Planned downtime (maintenance windows) is scheduled and communicated in advance. Unplanned downtime is caused by bugs, infrastructure failures, or attacks. Modern architectures aim for zero planned downtime by using rolling deployments, blue-green deployments, and database migrations that do not require locks.
Examples
A SaaS company communicates uptime to customers.
The company publishes a status page at status.example.com showing real-time and historical uptime for each service. The API shows 99.97% uptime over the last 90 days. The dashboard shows 99.92%. Prospective customers check the status page before signing a contract. The transparency builds trust, and the numbers give the sales team concrete data for enterprise deals.
A team calculates the cost of downtime.
The e-commerce platform processes $50,000 in orders per hour. An hour of downtime costs $50,000 in direct lost revenue, plus customer service calls, refund processing, and brand damage. The team calculates that investing $200,000 in redundant infrastructure would prevent the two annual outages that cost $100,000 each. The investment pays for itself in one year.
A company achieves zero planned downtime.
The company used to take the service offline for 30 minutes every Saturday for database maintenance. Customers complained. The team switches to rolling deployments (no code downtime), online schema migrations (no database downtime), and live certificate rotation (no TLS downtime). Planned maintenance windows are eliminated entirely. The status page shows 100% uptime for three consecutive months.
In practice
Read more on the blog
Frequently asked questions
What does 'five nines' uptime mean?
99.999% availability, or about 5 minutes and 15 seconds of downtime per year. Very few services achieve this. It requires redundant everything: multiple data centers, automatic failover, no single points of failure, and deployments that do not cause any interruption. For context, 99.9% (three nines) allows about 8.7 hours of downtime per year. 99.99% (four nines) allows about 52 minutes. Each additional nine is roughly 10x harder and more expensive to achieve.
How do you measure uptime accurately?
Use external monitoring tools (Pingdom, UptimeRobot, Datadog Synthetics) that test your service from multiple locations every minute. Internal monitoring can miss problems that external users see, like DNS failures or CDN outages. Define what 'up' means precisely: is the service up if it responds but with errors? Most teams count a service as down when error rates exceed a threshold (e.g., more than 1% of requests fail) or response times exceed a limit (e.g., p95 latency above 5 seconds).
Related terms
The percentage of time a system is operational and accessible to users.
Service level agreement: a contractual commitment to specific performance and availability levels.
Service level objective: an internal reliability target for a service, like 99.9% availability or p99 latency under 200ms.
An unplanned event that disrupts a service or degrades it below its expected quality, requiring a coordinated response.
Site Reliability Engineering: an engineering discipline focused on keeping systems running reliably at scale.

Want the complete playbook?
Picks and Shovels is the definitive guide to developer marketing. Amazon #1 bestseller with practical strategies from 30 years of marketing to developers.