Horizontal scaling
hor-ih-ZON-tul SKAY-ling
Adding more servers to handle increased load, instead of upgrading existing servers.
Horizontal scaling means adding more servers to handle more traffic. If one server handles 1,000 requests per second and you need to handle 5,000, you add four more servers behind a load balancer. Each server is identical. Traffic distributes evenly. You have also added redundancy: if one server dies, the others pick up the slack.
The alternative is vertical scaling: making one server bigger. Horizontal scaling wins at large scale because there is no ceiling. You can always add another server. A single server maxes out at some point, no matter how much RAM or CPU you add. AWS, Google, and every major cloud provider built their businesses on the assumption that you scale horizontally.
Horizontal scaling requires your application to be stateless, or at least to externalize state. If server A stores user session data in memory, a load balancer cannot route that user's next request to server B. The solution: store sessions in Redis, user data in a database, and file uploads in S3. Each server becomes interchangeable. This architectural constraint is why modern applications separate compute from storage.
Examples
A startup scales for a viral launch.
The app runs on 2 servers. A TikTok video drives traffic to 100x normal levels. Auto-scaling adds 18 more servers within 3 minutes. The load balancer distributes traffic across all 20 servers. Users experience no degradation. After the spike subsides, auto-scaling removes the extra servers and costs return to normal.
A team migrates from vertical to horizontal scaling.
The team runs one large database server with 128GB RAM and 32 CPUs. It is at capacity. Upgrading to 256GB costs $8,000/month. Instead, they shard the database across 4 smaller servers at $1,500/month each: $6,000 total for 4x the capacity. Future growth just means adding another shard.
An architect designs for horizontal scaling from the start.
The architect enforces three rules: no server-side sessions (use Redis), no local file storage (use S3), and no in-memory caches that cannot be shared (use Redis). The application runs identically on one server or one hundred. When the product goes viral six months later, scaling is a configuration change, not an architecture rewrite.
In practice
Read more on the blog
Frequently asked questions
When should you scale horizontally vs vertically?
Scale vertically first when you have a single server that is not at its limits yet. It is simpler. Scale horizontally when you need redundancy (one server dying should not take down your service), when vertical limits are approaching, or when you need to scale beyond what any single machine can handle. Most production services end up scaling horizontally because the ceiling on vertical scaling is real.
Why is horizontal scaling harder for databases?
Application servers are usually stateless, so adding more is trivial. Databases hold state, and splitting that state across servers (sharding) introduces complexity. Queries that span shards are slower. Transactions across shards are difficult. Joins between shards are often impossible. That is why most teams scale their application tier horizontally first and keep their database on a single large server as long as possible.
Related terms
Upgrading an existing server with more CPU, RAM, or storage instead of adding more servers.
Distributing incoming network traffic across multiple servers so no single server becomes a bottleneck.
Storing a copy of data in a faster location so repeated requests do not hit the slower original source.
Content delivery network: a global network of servers that caches and serves content from locations close to users.
The percentage of time a system is operational and accessible to users.

Want the complete playbook?
Picks and Shovels is the definitive guide to developer marketing. Amazon #1 bestseller with practical strategies from 30 years of marketing to developers.