Embeddings
em-BED-ings
Numerical representations of text that capture semantic meaning. Two similar sentences produce similar numbers, enabling AI-powered search.
An embedding converts text into a list of numbers, typically 1,536 or 3,072 numbers long. These numbers capture the meaning of the text, not just the words. "How do I deploy to production?" and "What are the steps to ship my app?" have different words but similar embeddings because they mean roughly the same thing.
This is what makes semantic search possible. Traditional keyword search matches exact words. Embedding-based search matches meaning. A developer searching your docs for "authentication" will also find content about "login," "sign in," and "access control" because those concepts have similar embeddings.
OpenAI's text-embedding-3-small model costs $0.02 per million tokens. You can embed your entire documentation for a few dollars. Store the embeddings in a vector database like Pinecone, Weaviate, or Supabase's pgvector. When a user asks a question, embed the question, find the closest document embeddings, and you have RAG.
Examples
Building a semantic search for developer docs.
You embed every page of your documentation using OpenAI's embedding API. Each page becomes a vector of 1,536 numbers stored in Supabase pgvector. When a developer searches for "rate limiting," the system finds pages about throttling, quotas, and API limits, even if those pages never use the phrase "rate limiting."
Detecting duplicate support tickets.
A support team embeds all incoming tickets. When a new ticket arrives, the system compares its embedding to recent tickets. If two tickets have cosine similarity above 0.92, they are likely about the same issue. The system links them automatically.
Content recommendation engine.
A developer reads a blog post about database indexing. The system compares that post's embedding to all other posts. It recommends three posts about query optimization and one about caching because those topics are semantically close, even though they share few keywords.
In practice
Read more on the blog
Related terms
A database optimized for storing and searching embeddings. The backbone of every RAG pipeline and semantic search system.
Fetching relevant data and feeding it to an LLM so the response is grounded in real, current information instead of training data alone.
The smallest unit of text an LLM processes. Roughly 4 characters or 3/4 of a word. Tokens determine cost and context limits.
A neural network trained on massive text data to generate and understand language. The technology behind ChatGPT, Claude, and Gemini.

Want the complete playbook?
Picks and Shovels is the definitive guide to developer marketing. Amazon #1 bestseller with practical strategies from 30 years of marketing to developers.