Uptime and downtime

Uptime is the amount of time your service is operational and accessible. Downtime is the amount of time it is not. These are the most basic measures of reliability, and they drive real business outcomes. Every minute of downtime on Amazon.com costs an estimated $220,000 in lost sales. For a startup, downtime might mean losing the deal your sales team is about to close.

Uptime is usually expressed as a percentage over a period. "99.9% uptime" means the service can be down for a maximum of 43 minutes per month. "99.99% uptime" means 4.3 minutes per month. Each additional nine is exponentially harder to achieve. Going from 99% to 99.9% might mean adding redundancy. Going from 99.9% to 99.99% might mean redesigning the architecture. Going from 99.99% to 99.999% might mean multi-region failover with no single points of failure.

Downtime can be planned or unplanned. Planned downtime (maintenance windows) is scheduled and communicated in advance. Unplanned downtime is caused by bugs, infrastructure failures, or attacks. Modern architectures aim for zero planned downtime by using rolling deployments, blue-green deployments, and database migrations that do not require locks.

Examples

A SaaS company communicates uptime to customers.

The company publishes a status page at status.example.com showing real-time and historical uptime for each service. The API shows 99.97% uptime over the last 90 days. The dashboard shows 99.92%. Prospective customers check the status page before signing a contract. The transparency builds trust, and the numbers give the sales team concrete data for enterprise deals.

A team calculates the cost of downtime.

The e-commerce platform processes $50,000 in orders per hour. An hour of downtime costs $50,000 in direct lost revenue, plus customer service calls, refund processing, and brand damage. The team calculates that investing $200,000 in redundant infrastructure would prevent the two annual outages that cost $100,000 each. The investment pays for itself in one year.

A company achieves zero planned downtime.

The company used to take the service offline for 30 minutes every Saturday for database maintenance. Customers complained. The team switches to rolling deployments (no code downtime), online schema migrations (no database downtime), and live certificate rotation (no TLS downtime). Planned maintenance windows are eliminated entirely. The status page shows 100% uptime for three consecutive months.

Frequently asked questions

What does 'five nines' uptime mean?

99.999% availability, or about 5 minutes and 15 seconds of downtime per year. Very few services achieve this. It requires redundant everything: multiple data centers, automatic failover, no single points of failure, and deployments that do not cause any interruption. For context, 99.9% (three nines) allows about 8.7 hours of downtime per year. 99.99% (four nines) allows about 52 minutes. Each additional nine is roughly 10x harder and more expensive to achieve.

How do you measure uptime accurately?

Use external monitoring tools (Pingdom, UptimeRobot, Datadog Synthetics) that test your service from multiple locations every minute. Internal monitoring can miss problems that external users see, like DNS failures or CDN outages. Define what 'up' means precisely: is the service up if it responds but with errors? Most teams count a service as down when error rates exceed a threshold (e.g., more than 1% of requests fail) or response times exceed a limit (e.g., p95 latency above 5 seconds).

Examples

In practice

Read more on the blog

Frequently asked questions

What does 'five nines' uptime mean?

How do you measure uptime accurately?

Related terms

Want the complete playbook?