Throughput
THROO-put
The number of operations a system can process in a given time period.
Throughput measures how many operations a system handles per unit of time. For a web API, throughput is typically measured in requests per second (RPS). For a database, it is queries per second. For a message queue, it is messages per second.
Throughput and latency are related but different. Latency is how long one request takes. Throughput is how many requests the system handles simultaneously. A system can have low latency and high throughput (fast and handles many requests) or low latency and low throughput (fast but only handles a few at a time).
Throughput determines capacity. If your API handles 1,000 RPS and peak traffic is 800 RPS, you have headroom. If peak traffic hits 1,200 RPS, requests start queuing, latency spikes, and some requests fail. Understanding throughput lets you plan capacity before you hit limits.
Examples
A load test reveals throughput limits.
The team runs a load test ramping from 100 to 5,000 RPS. At 2,000 RPS, latency starts increasing. At 3,000 RPS, error rates spike. The system's effective throughput limit is 2,500 RPS. The team knows they need to scale before traffic reaches that level.
A database becomes the throughput bottleneck.
The API servers handle 5,000 RPS easily but the database maxes out at 2,000 queries per second. Adding more API servers does not help because the bottleneck is the database. The fix: add read replicas for read-heavy queries and implement caching for frequently accessed data.
A team measures throughput over time.
The monitoring dashboard shows that throughput peaks at 3pm daily (west coast users arrive while east coast is still active) and drops to near-zero at 4am. The team uses this data to schedule batch jobs during off-peak hours and pre-scale infrastructure before the daily peak.
In practice
Read more on the blog
Frequently asked questions
What is the relationship between latency and throughput?
They are inversely related under load. As throughput approaches capacity, latency increases because requests compete for resources. A system at 50% capacity has low latency. The same system at 90% capacity has higher latency because requests queue up. Keeping headroom keeps latency low.
How do you increase throughput?
Horizontal scaling (more servers), caching (reduce database load), database optimization (indexes, read replicas), async processing (move slow operations to background jobs), and load balancing (distribute requests evenly). The most effective approach depends on where the bottleneck is.
Related terms
The time delay between a request being sent and a response being received.
Adding more servers to handle increased load, instead of upgrading existing servers.
Distributing incoming network traffic across multiple servers so no single server becomes a bottleneck.
Restricting how many requests a client can make to an API within a time window to prevent abuse and overload.

Want the complete playbook?
Picks and Shovels is the definitive guide to developer marketing. Amazon #1 bestseller with practical strategies from 30 years of marketing to developers.