What is RAG?Retrieval-Augmented Generation Explained
RAG — Retrieval-Augmented Generation — is the technique that lets a large language model answer questions using your own documents, databases, and knowledge bases, rather than guessing from training data alone. It is the most widely deployed enterprise LLM architecture in production today.
This guide covers how RAG works, its architecture, key components, vector databases, chunking strategies, real-world use cases, production challenges, and what AI engineers need to know to build production-grade RAG systems.
RAG: Quick Facts
| Item | Explanation |
|---|---|
| Full form | Retrieval-Augmented Generation |
| Main purpose | Ground LLM answers in external documents, databases, or knowledge bases |
| Used with | LLMs (OpenAI GPT-4o, Anthropic Claude, Google Gemini) + vector databases |
| Key components | Document loader, text splitter, embedding model, vector database, retriever, reranker, prompt template, LLM, evaluator |
| Common use cases | Internal knowledge assistants, HR bots, support chatbots, legal document search, clinical research assistants, sales enablement |
| Main benefit | Accurate, grounded, citable answers from private or up-to-date data — without retraining the LLM |
| Main limitation | Answer quality depends entirely on retrieval quality — if the wrong chunks are retrieved, the answer will be wrong or hallucinated |
| Primary frameworks | LangChain, LlamaIndex |
| Evaluation tool | RAGAS (faithfulness, answer relevancy, context precision, context recall) |
| Related Technovids training | AI Engineering Course · Production AI Engineering |
What is RAG in AI?
RAG (Retrieval-Augmented Generation) is a technique that gives a large language model access to external information at the time it generates a response. Instead of relying only on what it learned during training, the model first retrieves relevant passages from a document store or database, then uses those passages as context when it generates its answer.
Simple analogy
Imagine answering a difficult exam question. A standard LLM is like a student who can only use what they memorised. A RAG system is like a student who can look up specific passages in a set of reference books before writing the answer — and then cites the source they used.
For a brief one-paragraph definition, see our short RAG glossary definition. This page goes much deeper — covering the full architecture, components, and production engineering considerations.
Why RAG is Needed: LLM Limitations
Large language models are powerful but have five fundamental limitations that make them impractical for many enterprise use cases without augmentation:
Outdated knowledge
LLMs have a training cutoff date. They do not know about events, policy changes, product updates, or new research that occurred after training. A model trained on data up to mid-2024 cannot answer questions about changes in 2025 or 2026.
Hallucination risk
When asked about specific facts it does not know, an LLM will often generate plausible-sounding but incorrect answers rather than saying "I don't know." This is especially dangerous for legal, medical, HR, or financial contexts where accuracy is critical.
No private data access
LLMs are trained on public data. They have no knowledge of your company's internal documents, policies, product specifications, customer data, or proprietary processes — unless you provide that information explicitly.
No enterprise data context
Every organisation's data is unique. Industry-specific terminology, internal processes, custom product names, and organisational hierarchy are not in any publicly trained model. RAG lets you inject this context at query time.
Weak source citation
A standard LLM cannot tell you which document, page, or record an answer came from — because it has no document. RAG retrieves specific source chunks, making grounded, citable answers possible.
How RAG Works: Step by Step
A RAG system has two phases: an indexing phase (run once when documents are loaded) and a query phase (run every time a user asks a question).
Indexing Phase (one-time setup)
Load documents
Source documents — PDFs, Word files, HTML, Markdown, databases — are loaded using document loaders. Text is extracted and normalised.
Chunk the text
Documents are split into smaller pieces called chunks. Each chunk is typically 100–500 tokens with a small overlap to preserve context across boundaries.
Embed each chunk
Each text chunk is converted to a vector — a list of numbers that captures its semantic meaning — using an embedding model like OpenAI text-embedding-3-small.
Store in vector database
The vectors (and the original text of each chunk) are stored in a vector database like Pinecone, Chroma, or pgvector, ready for semantic search.
Query Phase (every user question)
User asks a question
The user submits a query — via a chat UI, API call, or application form.
Embed the query
The query is converted to a vector using the same embedding model used during indexing.
Search the vector database
The query vector is compared against all stored chunk vectors. The top-K most semantically similar chunks are retrieved.
Optional: rerank
A reranking model (such as Cohere Rerank) re-scores the top-K chunks for precision, ensuring the most relevant chunks go into the prompt.
Build the prompt
Retrieved chunks are inserted into a prompt template alongside the user query. The template instructs the LLM to answer only from the provided context.
Generate the answer
The LLM (GPT-4o, Claude, Gemini) receives the assembled prompt and generates a grounded answer based on the retrieved context.
Return answer with citations
The system returns the answer along with references to the source documents or chunks, so the user can verify the information.
RAG Architecture Diagram
The complete RAG pipeline — indexing (top) and query (bottom).
Indexing Pipeline (one-time)
Documents
PDF, Word, HTML
Chunker
Text splitter
Embedding Model
text-embedding-3
Vector Database
Pinecone / Chroma
Query Pipeline (every user query)
User Query
Natural language
Embed Query
Same model
Vector Search
Top-K chunks
Reranker
(optional)
Prompt Template
Context + query
LLM
GPT-4o / Claude
Answer + Citations
Grounded response
Key Components of a RAG Pipeline
Documents
The raw source material — PDFs, Word docs, web pages, databases, markdown files. Document quality directly determines answer quality. Poorly formatted or OCR-errored documents produce poor answers.
Chunker
Splits documents into retrieval-sized pieces. The chunking strategy (size, overlap, splitting method) is one of the most impactful configuration decisions in a RAG system.
Embedding Model
Converts text to vectors. OpenAI text-embedding-3-small is the most common. Open-source alternatives include BAAI/bge and nomic-embed. The same model must be used for both indexing and query.
Vector Database
Stores chunk embeddings and enables fast approximate nearest-neighbour search. Pinecone (managed), Chroma (local), FAISS (in-memory), pgvector (Postgres), Weaviate (schema-native).
Retriever
Fetches the top-K most similar chunks for a given query. Can be vector-only (semantic), keyword-only (BM25), or hybrid (both combined via Reciprocal Rank Fusion).
Reranker
A cross-encoder model that re-scores the top-K retrieved chunks for precision. Cohere Rerank, bge-reranker. Dramatically improves retrieval precision before generation.
Prompt Template
Structures the retrieved context and user query into an LLM prompt. Instructs the model to answer only from context, cite sources, and refuse to speculate when context is insufficient.
LLM
The language model that generates the final answer — GPT-4o, Claude Sonnet, Gemini Pro, or an open-source model. The LLM's context window limits how many retrieved chunks can be included.
Response Evaluator
Measures retrieval and generation quality. RAGAS provides faithfulness, answer relevancy, context precision and context recall metrics. LangSmith provides request-level tracing and monitoring.
RAG in Action: A Company Policy Assistant
Consider a company that has a 200-page employee handbook. Without RAG, employees have to search the document manually. With RAG, they ask a question and get a grounded answer with the source section cited.
User question
"How many days of paid paternity leave am I entitled to?"
Retrieved context (from vector search)
"...Effective January 2025, the company provides 15 days of paid paternity leave for employees who have completed 6 months of continuous service. Leave must be taken within 90 days of the child's birth or adoption date... [Employee Handbook, Section 7.4 — Parental Leave]"
Generated answer with citation
"You are entitled to 15 days of paid paternity leave, provided you have completed at least 6 months of continuous service. Leave must be taken within 90 days of the birth or adoption date. [Source: Employee Handbook, Section 7.4]"
Without RAG (standard LLM chatbot)
"Paternity leave policies vary by country and company. In India, the Paternity Benefit Bill proposes 15 days... However, I do not have access to your company's specific policy."
Result: Generic, ungrounded answer. Potentially wrong. No citation.
The RAG system gives a precise, grounded, citable answer from the actual company document. The non-RAG chatbot gives a generic, potentially misleading response with no company-specific information.
RAG vs Normal LLM Chatbot
| Dimension | Normal LLM Chatbot | RAG System |
|---|---|---|
| Knowledge source | Training data only — fixed at model cutoff | Training data + retrieved external documents in real time |
| Private data access | None — has no knowledge of your specific documents | Yes — indexes and retrieves from your own document store |
| Source citations | Cannot cite sources — does not have them | Cites the specific document, section, or page retrieved |
| Hallucination risk | High for domain-specific or recent queries | Lower — answer is grounded in retrieved context |
| Knowledge updates | Requires retraining or prompt-stuffing | Real-time — update the document store, no retraining |
| Enterprise use cases | Limited — suitable for general-purpose tasks | Designed for internal knowledge, policies, compliance, support |
| Auditability | Low — no source traceability | High — every answer traces to a source document chunk |
RAG vs Fine-Tuning
RAG and fine-tuning are complementary rather than competing approaches. They solve different problems. Many production AI systems use both — RAG for factual, grounded knowledge retrieval and fine-tuning for adapting model behavior, tone, or task format.
| Dimension | RAG | Fine-Tuning |
|---|---|---|
| Knowledge update | Real-time — update the document store | Requires full or partial retraining |
| Dynamic or changing data | Excellent — retrieves the latest indexed content | Poor — stale after training cutoff |
| Style and behavior | Limited — responds from retrieved context | Excellent — teaches model new formats and tone |
| Private data access | Yes — indexed at inference time | Yes — baked into model weights (harder to update) |
| Cost | Lower — inference + vector DB | Higher — training compute + storage |
| Auditability | High — every answer traces to source chunks | Low — knowledge is in opaque weights |
| Best for | Factual queries on dynamic or private data | Task patterns, writing style, response format |
Rule of thumb
If the problem is "the model doesn't know our specific information" — use RAG. If the problem is "the model doesn't behave the way we want" — consider fine-tuning. Most enterprise knowledge assistant projects need RAG, not fine-tuning.
For a full decision framework — use cases, cost comparison, data requirements, risks, and when to combine both — see the RAG vs Fine-Tuning comparison guide.
Vector Databases in RAG
A vector database is the retrieval engine of a RAG system. It stores text chunks as high-dimensional vectors (embeddings) and enables fast semantic search — finding content by meaning rather than exact keyword match.
How vector similarity works
When you embed the query "maternity leave policy" and embed the chunk "Section 7.3 — Parental Leave: Employees are entitled to...", the two vectors are geometrically close in the embedding space — even though the words are different. This is semantic search: finding content by meaning, not keyword.
Chroma
Local developmentSimple Python-native setup. No infrastructure needed. Persists to local disk. Ideal for prototyping, demos, and small-scale RAG projects.
Pinecone
Production (managed)Fully managed cloud vector database. Scalable, fast, with built-in metadata filtering and namespacing. The most popular choice for production RAG deployments.
FAISS
In-memory / researchFacebook AI Similarity Search. Extremely fast for in-memory use cases. No persistence by default. Good for prototyping and research when scale is not a concern.
pgvector
PostgreSQL integrationA PostgreSQL extension that adds vector storage and similarity search. Best for teams already running Postgres who want to avoid adding a new infrastructure component.
Weaviate
Schema-native / hybridSupports complex data schemas, hybrid search (vector + BM25), and built-in object storage. Good for RAG applications with rich metadata filtering requirements.
Qdrant
Self-hosted / cloudHigh-performance, Rust-based vector database with both managed cloud and self-hosted options. Good for teams needing full infrastructure control with production-grade performance.
Chunking Strategies
Chunking is how you split your documents before embedding. It is one of the most impactful decisions in a RAG system — and one of the most commonly underestimated. Poor chunking is the leading cause of poor retrieval quality.
→ Fixed-size chunking
Split every N tokens (e.g., 512 tokens) with a small overlap (e.g., 50 tokens) to prevent context loss at boundaries. Simple, fast, predictable. Works well for homogeneous text but can split mid-sentence.
→ Sentence-window chunking
Split at sentence boundaries and include surrounding sentences for context. Better semantic coherence than fixed-size. The retrieval unit is small but the context passed to the LLM includes neighbouring sentences.
→ Semantic chunking
Split at natural semantic boundaries — paragraphs, sections, topic changes. Uses a secondary embedding comparison to detect where meaning changes significantly. Produces more coherent chunks at the cost of complexity.
→ Hierarchical chunking
Create both small child chunks (for precise retrieval) and larger parent chunks (for better context). Retrieve small chunks, then pass the parent chunk to the LLM. The "parent document retriever" pattern in LangChain.
→ Metadata enrichment
Attach metadata to each chunk — source document name, page number, section title, creation date, author. Enables metadata filtering at retrieval time: "only search chunks from HR documents created after 2024".
Why bad chunking causes bad answers
If a critical sentence is split across two chunks and neither chunk is retrieved, the LLM will not have the information it needs. If chunks are too large, they dilute the query match score and bring in irrelevant content. If there is no overlap, context at chunk boundaries is lost. Chunking strategy should be tested with RAGAS evaluation against real queries on your specific document types.
Enterprise RAG Use Cases
Internal Knowledge Assistant
Employees query company-wide policies, IT documentation, SOPs, and process guides. Replaces long searches through SharePoint or Confluence with instant grounded answers.
HR Policy Assistant
Answers questions about leave entitlements, appraisal processes, benefits, onboarding requirements, and compliance policies — grounded in the actual HR documentation.
Legal Document Search
Searches contract repositories, regulatory guidance, case law summaries, and compliance documents. Returns relevant clauses with citations for legal team review.
Clinical / Pharma Research
Searches clinical trial protocols, research publications, product data sheets, and safety information. Helps researchers and regulatory teams find relevant literature quickly.
Customer Support Knowledge Base
Agents query product documentation, troubleshooting guides, and release notes to resolve customer issues faster. Can be exposed directly to customers via a self-service chatbot.
Sales Enablement Assistant
Sales teams query product specifications, competitive battle cards, pricing guidelines, and customer case studies. Reduces time spent searching for the right information during deals.
Training Content Assistant
Learners ask questions about course materials, module content, and assessments. Provides instant clarification from actual course content with source references.
Common RAG Challenges
Poor document quality
OCR errors in scanned PDFs, inconsistent formatting, missing metadata, and duplicate content all degrade retrieval quality before a single query is made. Garbage in = garbage out.
Bad chunking decisions
Chunks that are too small miss context; too large dilute relevance scores. Splits at the wrong boundaries mean critical sentences get divided. Chunking strategy should be validated for your specific document types.
Irrelevant retrieval
Semantic similarity is not the same as relevance for the specific query. A vector search may retrieve chunks about similar-sounding topics that are not actually relevant. Reranking and hybrid search mitigate this.
Hallucinated citations
Even with retrieved context, LLMs can still hallucinate — especially if the prompt does not strictly constrain them to answer only from context. Source citation enforcement and faithfulness evaluation (RAGAS) are required.
Latency
RAG adds latency: embedding the query, vector search, optional reranking, and an LLM call all take time. For real-time applications, all components must be optimised — cached embeddings, fast vector search, streaming LLM responses.
Cost
Embedding 100,000 documents, storing them in a managed vector DB, and calling an LLM for every query adds up. Cost modelling — per-query cost, indexing cost, vector DB tier — is a production engineering concern.
Security and access control
Different users should only be able to retrieve documents they are authorised to see. Access control at the vector database level (metadata filtering, namespace isolation) is essential for multi-tenant enterprise RAG deployments.
Evaluation difficulty
It is non-trivial to know objectively whether your RAG system is working well. RAGAS provides automated evaluation metrics, but they require a test set with ground-truth question-answer pairs — which someone must create.
Production RAG Best Practices
Moving a RAG prototype into production requires addressing reliability, quality, cost, and security. These are the practices that separate tutorial RAG from production RAG.
Clean data ingestion
Pre-process documents before chunking — remove headers/footers, normalise whitespace, fix OCR errors. Data quality is the biggest lever in RAG quality improvement.
Metadata filtering
Attach rich metadata to chunks (department, document type, date, access level) and filter at retrieval time. Reduces noise from irrelevant documents.
Reranking
Add a cross-encoder reranker (Cohere Rerank, bge-reranker) after vector retrieval. Dramatically improves precision at the cost of slightly higher latency.
Source citations
Always return source metadata with answers. Users should be able to verify every AI-generated answer against the original document.
Evaluation with test sets
Build a test set of 20–50 real questions with expected answers. Run RAGAS against it regularly. Track faithfulness and context precision over time as the document store changes.
Monitoring with LangSmith
Instrument all LangChain calls with LangSmith. Trace retrieval queries, token counts, latency, and LLM responses. LangSmith is to RAG what APM is to backend services.
Access control
Use vector DB namespaces or metadata filtering to ensure users only retrieve documents they are authorised to see. Never assume all indexed content should be accessible to all users.
Cost optimisation
Cache frequently embedded queries, choose a smaller embedding model for large-scale indexing, use batch embedding for ingestion, and monitor token usage per query in LangSmith.
Build production RAG with live instruction
The Production AI Engineering programme builds production-grade RAG systems with full evaluation pipelines, monitoring, reranking, access control and multi-agent patterns — for developer teams.
View Production AI Engineering training →RAG Skills for AI Engineers
Building production RAG systems requires a distinct set of skills beyond basic LLM API calls. AI engineers who can design, build, evaluate and optimise RAG pipelines are consistently in demand — and command a meaningful salary premium over engineers with only tutorial-level RAG exposure.
+ Embedding models
Selecting, using and comparing embedding models. Understanding trade-offs between quality and cost.
+ Vector databases
Indexing, querying, metadata filtering and access control across Pinecone, Chroma, pgvector.
+ Retriever design
Vector, keyword, hybrid, and self-query retrievers. Choosing the right strategy for the data type.
+ Prompt templating
Structuring context + query prompts to minimise hallucination and enforce citation.
+ RAGAS evaluation
Faithfulness, answer relevancy, context precision and recall. Building and running evaluation pipelines.
+ Deployment
FastAPI, Docker, and cloud deployment of RAG services. LangSmith instrumentation for production monitoring.
For the complete skill set required for AI engineers — including RAG, agents, MCP, deployment and LLMOps — see the AI Engineer Skills guide.
RAG Project Ideas
The best way to learn RAG is to build a real project — deployed, with RAGAS evaluation, and publicly accessible on GitHub. These are the RAG project types with the highest learning value and portfolio signal.
→ Company Knowledge Assistant
Index internal policies, SOPs, and guides. Build a chat interface that answers employee questions with source citations from actual company documents.
→ PDF Q&A Chatbot
Upload any PDF and ask questions about it. Demonstrates the full pipeline: loader, chunker, embeddings, retrieval, generation, and streaming. Deploy with FastAPI.
→ Clinical Research Document Assistant
Index medical research papers or drug data sheets. A domain-specific RAG system that demonstrates retrieval quality for technical vocabulary.
→ Customer Support Bot
Index product documentation and FAQs. Answers support queries with grounded responses and citations. Add guardrails for off-topic queries and escalation logic.
→ Course Content Assistant
Index course materials, lecture notes, and textbooks. Helps learners ask specific questions and get answers from the actual course content — with section references.
For full project walkthroughs — including architecture, tools, skills demonstrated, and GitHub presentation tips — see the AI Engineer Projects guide.
Recommended Learning Path
| Goal | Recommended Resource |
|---|---|
| Understand the full AI engineering discipline RAG sits within | AI Engineering Guide → |
| Follow a structured stage-by-stage roadmap for building RAG and AI skills | AI Engineering Roadmap → |
| Learn every technical skill required to build production RAG systems | AI Engineer Skills Guide → |
| See RAG project walkthroughs with architecture, tools and deployment steps | AI Engineer Projects Guide → |
| Build RAG systems with live instruction and 5 production projects | AI Engineering Course → |
| Go deep on production RAG, reranking, evaluation pipelines and MCP | Production AI Engineering → |
Want to build production-ready RAG systems?
Reading about RAG is the foundation. Building and deploying a production RAG system with evaluation, monitoring, and reranking is where the real skill is developed. The AI Engineering Course and Production AI Engineering programme provide structured, live-instructor-led paths to get there.
Frequently Asked Questions — What is RAG?
What is RAG in AI?+
RAG stands for Retrieval-Augmented Generation. It is a technique that combines a retrieval step — fetching relevant information from documents or a database — with a generation step by a large language model (LLM). Instead of answering from training data alone, the LLM receives retrieved context at inference time and generates an answer grounded in that specific information. RAG enables LLMs to answer questions about private documents, up-to-date information, and domain-specific knowledge they were not trained on.
What is the full form of RAG?+
RAG stands for Retrieval-Augmented Generation. The term was introduced in a 2020 paper by researchers at Meta AI (Facebook AI Research) — "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" by Lewis et al. The name describes the approach: augmenting an LLM's generation with a retrieval step.
How does Retrieval-Augmented Generation work?+
RAG works in two main phases. Indexing phase: documents are loaded, split into chunks, each chunk is converted to a vector embedding, and embeddings are stored in a vector database. Query phase: (1) the user's query is embedded into a vector, (2) the vector database is searched for semantically similar chunks, (3) the top-K most relevant chunks are retrieved, (4) the chunks are formatted as context in a prompt template, (5) the LLM generates an answer using the retrieved context, (6) optional source citations are returned with the response.
Why is RAG used with LLMs?+
LLMs have three key limitations that RAG addresses: (1) knowledge cutoff — the model only knows what was in its training data, not recent events or updates; (2) no private data access — the model has no access to company documents, internal knowledge bases, or proprietary information; (3) hallucination — without specific context, the model may generate plausible-sounding but incorrect answers. RAG solves all three by providing the LLM with retrieved, accurate context at query time.
Is RAG better than fine-tuning?+
They serve different purposes. RAG is better when knowledge needs to be dynamic (updated without retraining), auditable (with source citations), and cost-effective. Fine-tuning is better when you want to change the model's behavior, tone, or task format — such as teaching it a specific writing style or response format. Most production enterprise AI systems use RAG for factual knowledge and may combine it with fine-tuning for behavior adaptation. For most knowledge assistant use cases, RAG alone is sufficient and significantly cheaper than fine-tuning.
Does RAG reduce hallucinations?+
Yes, significantly — but not completely. RAG reduces hallucinations on domain-specific queries by grounding the LLM's answer in retrieved context. When the model has accurate, relevant context in the prompt, it is less likely to fabricate information. However, two failure modes remain: (1) if retrieval fails to fetch relevant content, the model may still hallucinate; (2) if the retrieved context is itself incorrect or ambiguous, the model may propagate that error. Production RAG systems address this with RAGAS evaluation, reranking, and source citation enforcement.
Which vector databases are used in RAG?+
Common vector databases used in RAG: Chroma (local development, simple Python setup), Pinecone (managed cloud service, scalable for production), FAISS (in-memory, fast, good for prototyping), pgvector (PostgreSQL extension, good for teams already using Postgres), and Weaviate (schema-native, good for complex metadata filtering). The choice depends on deployment context, scale, and whether you need managed infrastructure or self-hosted control.
What is chunking in RAG?+
Chunking is the process of splitting source documents into smaller pieces (chunks) before embedding and indexing them. It is necessary because embedding models have token limits and because retrieving an entire large document is inefficient — you want to retrieve only the relevant passage. Common chunking strategies: fixed-size (split every N tokens with overlap), sentence-window (keep surrounding context), semantic (split at natural boundaries like paragraphs), and hierarchical (chunk + parent chunk). Poor chunking is one of the most common causes of RAG quality failures.
What are common RAG use cases?+
The most widely deployed RAG use cases are: (1) internal knowledge assistants — employees query company policies, procedures and documentation; (2) HR policy assistants — onboarding, leave, compliance queries; (3) customer support bots — product knowledge base, troubleshooting guides; (4) legal document search — contracts, case summaries, regulatory docs; (5) clinical/pharma research assistants — literature search, protocol documents; (6) sales enablement assistants — product specs, pricing, competitive intelligence; (7) course and training content assistants — learner Q&A from course materials.
Is RAG important for AI engineers?+
Yes. RAG is the most widely deployed enterprise LLM architecture, and the ability to design, build, evaluate and optimise production RAG systems is consistently the most in-demand AI engineering skill. Employers want engineers who understand the full pipeline — document loading, chunking strategy, embedding model selection, vector database tuning, retrieval evaluation, reranking, and RAGAS metrics — not just those who have run a basic LangChain RAG tutorial.
Can I build a RAG project as a beginner?+
Yes. A basic RAG project requires intermediate Python, LLM API access (OpenAI or Anthropic), and LangChain. The minimal stack is: LangChain document loaders, a text splitter, OpenAI embeddings, Chroma as the vector store, a retrieval chain, and a FastAPI endpoint. A working local RAG system can be built in a day. The challenge for beginners is deployment and evaluation — moving from a working notebook to a deployed API with RAGAS evaluation scores is where the real learning happens.
Which Technovids resource should I read next?+
If you want to understand the full AI engineering discipline that RAG sits within, read the AI Engineering guide at /ai-engineering. For a sequenced roadmap of how to build RAG skills step by step, see the AI Engineering Roadmap at /ai-engineering-roadmap. For the specific skills RAG requires, see the AI Engineer Skills guide. For RAG project ideas and architecture walkthroughs, see the AI Engineer Projects guide. For structured live training with 5 production RAG and agent projects, explore the AI Engineering Course.