When should I use RAG versus fine-tuning?

Use RAG when you need current, factual answers from specific documents. Use fine-tuning when you need the model to learn a new style, format, or domain-specific behavior. RAG is for knowledge. Fine-tuning is for skill. Many production systems use both.

Retrieval-augmented generation (RAG) - Definition and examples

RAG is a pattern where you fetch relevant documents first, then pass them to an LLM along with the user's question. The model generates its answer based on the retrieved content, not just its training data. This solves two problems: the model stays current (training data has a cutoff), and the model stays accurate (it answers from your actual documents, not its memory, reducing hallucinations).

The typical RAG pipeline works like this. A user asks a question. Your system converts the question into an embedding. It searches a vector database for the most similar documents. It passes those documents plus the question to the LLM. The LLM generates an answer grounded in the retrieved content.

Every company building an AI-powered support bot, documentation assistant, or internal knowledge tool is using some version of RAG. Vercel's AI SDK has RAG built in. LangChain and LlamaIndex are frameworks specifically designed for RAG pipelines. If your documentation is well-structured, RAG makes your product easier for AI to recommend accurately.

Examples

An AI-powered documentation assistant.

Supabase built an AI assistant that answers questions about their platform. When a developer asks "How do I set up Row Level Security?", the system retrieves the relevant docs pages, passes them to the LLM, and generates a step-by-step answer with links to the original documentation.

A sales team knowledge base.

A sales rep asks the internal AI: "What is our competitive positioning against Datadog?" RAG retrieves the latest battle card, recent win/loss reports, and pricing comparisons. The LLM synthesizes a concise answer from current internal documents, not outdated training data.

A customer support chatbot with current data.

Without RAG, the bot only knows what was in its training data. With RAG, it retrieves the customer's account info, recent support tickets, and current product docs before answering. The response is specific, current, and grounded in real data.

Retrieval-augmented generation

Examples

In practice

Read more on the blog

Frequently asked questions

When should I use RAG versus fine-tuning?

Related terms

Want the complete playbook?