Rate limiting

Rate limiting caps how many requests a client can make within a given time period. If the limit is 100 requests per minute and a client sends request 101, they get a 429 Too Many Requests response. The server protects itself from being overwhelmed, whether by a misbehaving client, a bot, or a denial-of-service attack.

Rate limits serve multiple purposes. They protect infrastructure from traffic spikes. They ensure fair usage across customers (one customer cannot consume all server capacity). They enforce pricing tiers (free plan gets 100 requests/hour, paid plan gets 10,000). And they provide a natural defense against abuse. Every public API, from Stripe to GitHub to OpenAI, implements rate limiting.

Implementation varies by use case. Token bucket and sliding window are the most common algorithms. Rate limits can apply per user, per API key, per IP address, or globally. Good APIs communicate limits clearly: the X-RateLimit-Limit header tells clients their cap, X-RateLimit-Remaining shows what is left, and X-RateLimit-Reset indicates when the window resets. This lets well-behaved clients throttle themselves before hitting the wall.

Examples

A developer hits a rate limit while building an integration.

The developer's script calls the GitHub API 80 times per minute. GitHub's rate limit is 60 requests per minute for unauthenticated requests. After the 60th request, every subsequent call returns a 429 with a Retry-After header. The developer adds authentication (which raises the limit to 5,000/hour) and implements exponential backoff.

A SaaS company uses rate limits to differentiate pricing tiers.

Free tier: 100 API calls per hour. Starter: 1,000 per hour. Pro: 10,000 per hour. Enterprise: custom limits. Each API key is tagged with its tier. The rate limiter checks the key's tier and applies the corresponding limit. When a free-tier user exceeds 100 calls, the response includes an upgrade prompt alongside the 429 error.

A platform mitigates a DDoS attack with rate limiting.

A bot network sends 500,000 requests per second from 10,000 IP addresses. The rate limiter detects that each IP is sending 50 requests per second, far above the 10/second per-IP limit. It blocks the offending IPs at the edge. Legitimate traffic (averaging 2 requests per second per IP) flows through normally. The attack is absorbed without impacting users.

Frequently asked questions

What happens when a client gets rate limited?

The server returns HTTP 429 Too Many Requests. Good APIs include a Retry-After header telling the client how many seconds to wait. Well-built clients implement exponential backoff: wait 1 second, then 2, then 4, then 8. Poorly-built clients retry immediately and keep hitting the limit. The best approach is to check X-RateLimit-Remaining headers and throttle proactively before reaching the limit.

How do you choose the right rate limit?

Start by analyzing actual usage patterns. If your median user makes 50 requests per minute and your p99 user makes 200, a limit of 300 per minute protects your infrastructure without affecting real users. Set limits per tier if you have paid plans. Monitor 429 response rates: if more than 1% of requests are getting rate-limited, your limits might be too tight or a client might need help optimizing their usage.

Examples

In practice

Read more on the blog

Frequently asked questions

What happens when a client gets rate limited?

How do you choose the right rate limit?

Related terms

Want the complete playbook?