Picks and Shovels: tech marketing for the AI era.

I wrote the book on developer marketing. Literally. Picks and Shovels hit #1 on Amazon.

Engineering and DevOps

Latency

LAY-ten-see

The time delay between a request being sent and a response being received.

Latency is the time between a user's action and the system's response. Click a button: the time until something happens is the latency. Make an API call: the time until the response arrives is the latency. Latency is measured in milliseconds (ms).

Low latency is invisible. Users click and things happen. High latency is painful. Users click and wait. Research consistently shows that every 100ms of added latency reduces conversion rates measurably. Amazon found that every 100ms of latency cost them 1% in sales.

Latency has many sources: network distance (a request traveling from New York to Singapore takes ~250ms), server processing time, database query time, and client-side rendering time. Reducing latency means attacking each source: CDNs reduce network distance, caching reduces database queries, and code optimization reduces processing time.

Examples

An API endpoint is too slow.

The /dashboard endpoint takes 800ms to respond. Profiling reveals: 50ms for network, 600ms for a complex database query, 100ms for serialization, 50ms for response. The team adds an index to the query (600ms drops to 30ms) and caches the result for 60 seconds. Total latency: 80ms.

A global application has uneven latency.

Users in the US experience 50ms latency (servers are in Virginia). Users in Australia experience 350ms. The team deploys edge servers in Sydney and Singapore. Australian latency drops to 60ms. CDN costs increase $200/month but Australian user engagement increases 25%.

A team measures percentile latency.

Average latency is 100ms, which looks fine. But p99 latency is 3 seconds, meaning 1% of requests take 3+ seconds. These are the requests users remember. The team investigates the p99 outliers and finds they all hit a slow third-party API. They add a timeout and cache.

In practice

Frequently asked questions

What is a good latency for a web API?

Under 200ms for user-facing requests. Under 50ms for real-time interactions (search autocomplete, chat). Under 500ms for complex operations (report generation, bulk operations). Users perceive anything over 300ms as 'slow.'

What is the difference between latency and response time?

They are often used interchangeably but technically differ. Latency is the network delay between request and response. Response time is latency plus server processing time. In practice, when people say 'latency' they usually mean end-to-end response time.

Related terms

P50/P95/P99

Percentile metrics showing the latency experienced by 50%, 95%, or 99% of requests.

Throughput

The number of operations a system can process in a given time period.

Caching

Storing a copy of data in a faster location so repeated requests do not hit the slower original source.

CDNCDN

Content delivery network: a global network of servers that caches and serves content from locations close to users.

Picks and Shovels: Marketing to Developers During the AI Gold Rush

Want the complete playbook?

Picks and Shovels is the definitive guide to developer marketing. Amazon #1 bestseller with practical strategies from 30 years of marketing to developers.

Get your copy Browse the FAQ