Load balancing

Load balancing distributes incoming requests across multiple servers so that no single server gets overwhelmed. If you have 10,000 requests per second and 5 servers, a load balancer spreads them out, roughly 2,000 per server. When one server goes down, the load balancer routes traffic to the remaining ones. Users never notice. Load balancing is essential for horizontal scaling and availability.

There are several load balancing strategies. Round-robin sends each request to the next server in order. Least-connections sends requests to the server handling the fewest active connections. Weighted routing sends more traffic to more powerful servers. The choice depends on your application. Stateless APIs work well with round-robin. Long-lived connections (WebSockets, streaming) need least-connections.

Load balancers operate at different network layers. Layer 4 (TCP/UDP) load balancers route based on IP and port. Layer 7 (HTTP) load balancers can inspect request headers, URLs, and cookies, allowing smarter routing. AWS has ALB (Application Load Balancer) for Layer 7 and NLB (Network Load Balancer) for Layer 4. Cloudflare, Nginx, and HAProxy are other common options.

Examples

An API scales to handle traffic spikes.

An e-commerce site normally runs 4 API servers behind an AWS ALB. On Black Friday, auto-scaling adds 12 more servers. The load balancer distributes traffic across all 16 servers automatically. Peak traffic hits 50,000 requests per second. No single server handles more than 3,500. Response times stay under 100ms.

A load balancer detects an unhealthy server.

One of eight servers starts returning 500 errors after a bad deploy. The load balancer's health check (a GET request to /health every 10 seconds) detects the failure within 20 seconds. It removes the unhealthy server from the pool. Traffic redistributes across the remaining seven servers. Users experience a brief error rate blip, then normal service.

A team chooses between load balancing strategies.

The team runs a chat application with persistent WebSocket connections. Round-robin load balancing causes uneven distribution: some servers have 10,000 connections while others have 500. They switch to least-connections, which sends new connections to the server with the fewest active ones. Connection counts equalize within an hour.

Frequently asked questions

What is the difference between a load balancer and a reverse proxy?

A reverse proxy sits in front of servers and forwards requests to them. A load balancer is a specific type of reverse proxy that distributes traffic across multiple servers. All load balancers are reverse proxies, but not all reverse proxies are load balancers. Nginx, for example, can act as both. A reverse proxy might route to a single backend while also handling SSL termination and caching.

Do you need a load balancer with only one server?

Not strictly, but it is still useful. A load balancer in front of a single server gives you health checking, SSL termination, and the ability to add servers later without changing your DNS or client configuration. When you are ready to scale, you just add servers behind the existing load balancer.

Examples

In practice

Read more on the blog

Frequently asked questions

What is the difference between a load balancer and a reverse proxy?

Do you need a load balancer with only one server?

Related terms

Want the complete playbook?