What is a Vector Database?AI, RAG and Semantic Search Explained
A vector database stores data as numerical embeddings and enables semantic search — finding information by meaning, not just keywords. It is the retrieval engine inside most RAG systems, AI document assistants, semantic search applications, and recommendation systems built on large language models.
This guide explains what vector databases are, how embeddings and vector search work, how they power RAG pipelines, which tools are used in production, and the skills AI engineers need to build retrieval-based AI applications.
Vector Database: Quick Facts
| Item | Explanation |
|---|---|
| Definition | A database that stores high-dimensional numerical vectors (embeddings) and enables fast similarity search by meaning |
| Main purpose | Find semantically similar data — documents, images, products — to a given query, without exact keyword matching |
| Used in | RAG systems, document Q&A assistants, semantic search, recommendation engines, AI chatbots, enterprise knowledge tools |
| Core concept | Embeddings — numerical representations of text or other data where similar meanings are geometrically close in vector space |
| Common tools | Pinecone, Chroma, FAISS, pgvector, Weaviate, Qdrant, Milvus |
| Main benefit | Enables AI applications to retrieve relevant context by semantic meaning rather than keyword match — dramatically improving answer quality |
| Main limitation | Retrieval quality depends entirely on embedding quality, chunk design, and metadata strategy — poor input = poor retrieval regardless of database choice |
| Related Technovids resource | What is RAG? Guide · AI Engineering Course · AI Engineer Skills Guide |
What is a Vector Database?
A vector database stores numerical representations of text, images or other data — called embeddings — so AI applications can search by meaning instead of exact keywords. When you query a vector database, you do not ask "which records contain the word 'policy'?" — you ask "which records are semantically most similar to this question?"
Simple analogy
A traditional database is like a filing cabinet organised by label — you can only find a file if you know exactly what the label says. A vector database is like a librarian who has read every book and understands what each is about — you can say "find me something about tax planning for small businesses" and get relevant results even if none of the documents use that exact phrase.
Vector databases are the core infrastructure component of most RAG systems, semantic search applications, and AI-powered knowledge tools. They exist because the most valuable enterprise data — documents, support tickets, clinical notes, product descriptions, emails — is unstructured, and cannot be searched meaningfully with SQL or keyword queries alone.
Why Vector Databases Matter in AI
Large language models are powerful text generators but they have a fundamental limitation: they only know what they were trained on. For AI applications to answer questions about your internal documents, your product catalogue, or your customer history, they need a way to retrieve that specific information at runtime. This is where vector databases become essential.
Keyword search misses too much
A HR policy document might call it "parental leave" while employees search for "maternity leave" or "paternity benefits". SQL LIKE queries and traditional search will miss these hits. Vector search finds them because the underlying meaning is similar — even when words differ.
LLMs need relevant context to answer accurately
Without retrieval, an LLM generates answers from training data alone — which may be outdated, generic, or simply wrong for your specific domain. RAG uses a vector database to retrieve the relevant, current, private context the LLM needs before generating a response.
Enterprise knowledge is in documents, not databases
Most enterprise knowledge lives in PDFs, Word docs, wikis, Confluence pages, and email threads — not in structured databases. Vector databases allow AI systems to make this unstructured knowledge searchable and retrievable at scale.
Scale requires indexed search, not brute force
You cannot embed a user query and then compare it to millions of documents one by one in real time. Vector databases use index structures (HNSW, IVF) to make Approximate Nearest Neighbour search fast even at millions or billions of vectors.
Metadata filters add precision
Vector databases allow combining semantic similarity search with structured metadata filters — "find the most semantically similar documents, but only from Q4 2025 and only tagged Finance". This hybrid capability is what makes them useful for real enterprise applications, not just demos.
What Are Embeddings?
Embeddings are the language that vector databases speak. An embedding is a list of numbers — typically 768 to 3,072 floating-point values — that encodes the semantic meaning of a piece of text (or image, audio, or other data). An embedding model generates this list such that texts with similar meanings produce numerically similar vectors.
Embeddings in practice — a concrete example
Text A: "How do I reset my password?"
Text B: "I forgot my login credentials"
Text C: "What is the capital of France?"
A and B share almost no words but have the same meaning. Their embeddings will be very close in vector space (high cosine similarity). C has entirely different meaning — its embedding will be far from both A and B.
When you build a document Q&A system, you run every document chunk through an embedding model to get its vector. When a user asks a question, you run the question through the same model to get its vector. Then you ask the vector database: "which stored vectors are most similar to this query vector?" — and you get back the most semantically relevant document chunks.
Common embedding models
- ▸OpenAI text-embedding-3-small / large
- ▸Cohere embed-v3
- ▸Google text-embedding-004
- ▸HuggingFace sentence-transformers
- ▸Mistral Embed
Key embedding properties
- ▸Same model must be used for indexing and querying
- ▸Dimensions vary (768–3,072) — higher is not always better
- ▸Shorter texts (chunks) embed better than very long documents
- ▸Domain-specific models may outperform general models for specialised content
How Vector Search Works: Step by Step
- 1
Content is split into chunks
Source documents (PDFs, web pages, database records) are divided into smaller segments — paragraphs, sentences, or fixed-size text windows. Chunk size affects retrieval quality: too small loses context; too large dilutes relevance.
- 2
Each chunk is converted into an embedding
An embedding model processes each chunk and outputs a high-dimensional vector. This vector numerically represents the meaning of that chunk.
- 3
Embeddings are stored in a vector database
Each vector is stored alongside the original chunk text and any metadata (source file, date, category, user access level). An index structure (HNSW, IVF) is built to enable fast search.
- 4
User query is converted into an embedding
When a user submits a question, the same embedding model converts it into a query vector. The model must be identical to the one used during indexing — otherwise vectors are not comparable.
- 5
Similar vectors are retrieved
The vector database computes the similarity between the query vector and all stored vectors using a distance metric (cosine similarity, dot product, or Euclidean distance). It returns the top-K most similar chunks.
- 6
Context is sent to an LLM
The retrieved chunks are assembled as context in a prompt template, alongside the user's question, and sent to a large language model. The LLM generates an answer grounded in the retrieved information.
Vector Database Architecture
The full pipeline from source documents to an LLM answer via a vector database.
Indexing pipeline (offline)
Documents
PDF · DOCX · Web
Chunks
Split by size / semantic
Embeddings
Embedding model
Vector DB
HNSW index
Query pipeline (real-time)
User Query
Question / prompt
Query Embedding
Same model
Similarity Search
Top-K results
LLM + Context
Grounded answer
Answer
To user
Vector Database vs Traditional Database
| Dimension | Traditional DB (SQL) | Vector Database |
|---|---|---|
| Data type | Structured rows and columns | High-dimensional numerical vectors (embeddings) |
| Search method | Exact match, range, JOIN | Approximate Nearest Neighbour (ANN) similarity search |
| Best for | Transactions, structured records, relational queries | Semantic search, RAG retrieval, recommendation, similarity |
| Query style | SQL: WHERE name = "X" | Find the K vectors most similar to this query vector |
| AI use cases | Limited — structured metadata only | Core component — retrieval in RAG, semantic search, agents |
| Limitations | Cannot handle unstructured text search by meaning | Retrieval quality depends on embedding quality; not designed for ACID transactions or relational joins |
Many production AI systems use both: a relational database (PostgreSQL, MySQL) for structured application data and a vector database (or pgvector extension) for semantic search over unstructured content.
Vector Search vs Keyword Search
| Property | Keyword Search (BM25) | Vector Search |
|---|---|---|
| Matching | Exact token matching — finds documents containing the words | Semantic matching — finds documents with similar meaning |
| Synonyms | ❌ "maternity leave" misses "parental leave" | ✅ Finds semantically equivalent phrases |
| Ranking | TF-IDF / BM25 frequency scoring | Distance in embedding space (cosine similarity) |
| Speed | Very fast — inverted index lookup | Fast — ANN index; slower than keyword at extreme scale |
| Metadata filters | ✅ Standard — WHERE clause | ✅ Supported — pre-filter or post-filter by metadata |
| Hybrid search | — (keyword only) | ✅ Combines vector search + BM25 for best-of-both results |
| Best for | Exact phrase search, known-item lookup | Conceptual/semantic queries, paraphrased questions, multilingual |
Vector Databases in RAG
Vector databases are the retrieval layer of most RAG (Retrieval-Augmented Generation) systems. When a user asks a question, the RAG pipeline queries the vector database to retrieve the most semantically relevant document chunks, then passes those chunks as context to the LLM — enabling the model to answer questions about private, current, or domain-specific information it was not trained on.
The RAG retrieval loop
- 1.Documents indexed as vector embeddings in the vector database
- 2.User query embedded with the same model
- 3.Top-K similar chunks retrieved from the vector database
- 4.Chunks injected into a prompt template as context
- 5.LLM generates an answer grounded in retrieved context
For a complete explanation of how RAG works architecturally — including the full pipeline, chunking strategies, reranking, evaluation with RAGAS, and production deployment — see the complete RAG guide. For guidance on when to use RAG versus fine-tuning, see the RAG vs Fine-Tuning comparison.
Common Vector Database Tools
The ecosystem has grown significantly. These are the tools most commonly used in AI engineering and RAG production systems as of 2026.
Pinecone
Managed cloudFully managed, serverless vector database. Simple API, built-in metadata filtering, namespace isolation, and high availability. Most popular choice for production RAG deployments that want managed infrastructure without operational overhead.
Chroma
Local / embeddedOpen-source, Python-native vector store. Zero infrastructure to spin up — installs as a pip package. Persists to local disk. Native LangChain and LlamaIndex integration. The standard starting point for prototypes, demos, and small-scale RAG applications.
FAISS
Library (in-memory)Facebook AI Similarity Search. A high-performance vector search library — not a full database. Operates in-memory (or saved to disk). Extremely fast for ANN search. No built-in metadata storage or API. Used directly or via LangChain wrappers for prototyping and research.
pgvector
PostgreSQL extensionOpen-source PostgreSQL extension that adds vector column type and similarity search operators. Best for teams already running Postgres who want to avoid introducing a new infrastructure component. Handles moderate scale well.
Weaviate
Self-hosted / cloudOpen-source vector database with native schema support, built-in hybrid search (BM25 + vector), object storage, and a GraphQL API. Good for applications with rich metadata and complex filtering requirements.
Qdrant
Self-hosted / cloudHigh-performance Rust-based vector database. Supports both managed cloud and self-hosted deployment. Fast, memory-efficient, with payload (metadata) filtering and sparse vector support for hybrid search.
Milvus
Open-source / cloudOpen-source vector database built for large-scale production. Zilliz Cloud is the managed version. Handles billions of vectors. Used by large-scale enterprise AI and recommendation systems that need horizontal scalability.
Pinecone vs Chroma vs FAISS vs pgvector
Each tool suits different contexts. This is a brief orientation — the right choice depends on your team's infrastructure, scale, and budget.
| Tool | Best for | Notes |
|---|---|---|
| Pinecone | Production, managed cloud | No infrastructure to manage. Serverless tier available. Metadata filtering and namespacing built in. Cost scales with index size. |
| Chroma | Prototyping, local dev | Zero setup. Python package. LangChain native. Not designed for multi-user production deployments at scale. |
| FAISS | In-memory research / prototype | Fastest for in-memory ANN search. No server, no API, no persistence by default. Use via LangChain for convenience. |
| pgvector | Teams already using PostgreSQL | SQL queries + vector search in one place. No extra infra. Slower than dedicated databases at very large scale (>10M vectors). |
| Weaviate | Hybrid search, rich schema | BM25 + vector natively. Good for complex metadata. More configuration than simpler tools. |
| Qdrant | Self-hosted or cloud, performance | Fast, Rust-based, supports sparse vectors. Good for teams needing full control over infrastructure. |
Chunking and Metadata
A vector database stores whatever you give it. If your input chunks are poorly designed, the embeddings will be poor, and retrieval quality will be poor — regardless of which vector database you choose. Chunking strategy is one of the most consequential engineering decisions in a RAG system.
Chunk size
Too small (e.g., 50 tokens) loses context — retrieved chunks may not contain enough information for the LLM to answer. Too large (e.g., 2,000 tokens) dilutes the embedding — a single chunk covers multiple topics, making it less semantically precise. Typical starting points: 256–512 tokens for general documents, 128 tokens for dense technical text.
Chunk overlap
Adding a small overlap (10–20% of chunk size) between adjacent chunks prevents relevant content from being split at a boundary and missing retrieval. Most LangChain text splitters support a configurable overlap parameter.
Semantic chunking
Instead of splitting by character or token count, semantic chunkers split at topic boundaries — detecting when the content shifts to a new idea. This produces more coherent, self-contained chunks but requires more processing.
Metadata filters
Every chunk should be stored with structured metadata — source file name, document date, category, author, access level. Metadata enables precise filtering alongside semantic search: "retrieve the most relevant chunks, but only from HR documents published after January 2026".
Bad chunking leads to poor retrieval
A common failure mode: developers index an entire 50-page PDF as a single chunk, or split mid-sentence, or embed the same boilerplate header in every chunk. The result is an LLM that either cannot find relevant information or retrieves the wrong context and hallucinates.
Similarity Search and Distance Metrics
When you ask a vector database "which stored vectors are most similar to my query?", it needs a way to measure similarity. Three metrics are commonly used:
Cosine Similarity
Score: −1 to 1 (higher = more similar)
Measures the angle between two vectors. Ignores vector magnitude — only direction matters. The most widely used metric for text embeddings. A score near 1 means very similar meaning; near 0 means unrelated; near −1 means opposite meaning.
Best for: Text embeddings, semantic search
Dot Product
Score: any real number (higher = more similar)
The sum of element-wise products. Unlike cosine, it accounts for both direction and magnitude. Used when vector magnitude is meaningful (e.g., normalised embeddings where dot product ≈ cosine similarity).
Best for: Normalised embeddings, recommendation
Euclidean Distance
Score: ≥0 (lower = more similar)
Straight-line distance between two points in vector space. Smaller distance means more similar. Common in image embeddings and clustering. Less common for text semantic search, where cosine is preferred.
Best for: Image embeddings, clustering
Practical rule
For most text-based RAG and semantic search applications, use cosine similarity with the embedding model your database provider recommends. The choice of metric rarely determines retrieval success — chunking quality and embedding model quality matter far more.
Hybrid Search
Hybrid search combines vector search with keyword search (BM25) to get the best of both approaches. It runs a vector similarity search and a keyword search in parallel, then merges the results — typically using Reciprocal Rank Fusion (RRF) or a weighted score combination.
When hybrid search improves results
- ▸Queries with specific product codes or technical identifiers (e.g., "invoice INV-2024-0471")
- ▸Enterprise documents where exact terminology matters (legal, medical, compliance)
- ▸Multilingual content where keyword matching in the original language helps
- ▸When users mix semantic intent with exact phrase requirements
Tools that support hybrid search natively
- ▸Weaviate — BM25 + vector built in
- ▸Qdrant — sparse + dense vector support
- ▸Azure AI Search — full hybrid retrieval stack
- ▸Elasticsearch + vector plugin
- ▸pgvector + pg_trgm (PostgreSQL)
Most production enterprise RAG systems benefit from hybrid search. Pure vector-only retrieval performs well for general semantic queries but under-performs on queries that include specific identifiers, codes, or names that benefit from exact matching.
Vector Database Use Cases
Vector databases power a wide range of AI applications across industries.
Document Q&A assistant
Index company PDFs, policies, and reports. Employees ask natural-language questions and get answers grounded in the retrieved document content.
Internal knowledge search
Make Confluence, Notion, and SharePoint searchable by meaning — not just text. Find relevant content even when the exact terminology differs.
Customer support knowledge bot
Index product documentation, FAQs, and support ticket history. The support bot retrieves the most relevant context before generating a resolution.
Clinical / pharma document search
Semantic search over clinical trial reports, drug information documents, and research papers — critical where exact keyword matching misses related concepts.
Legal document search
Retrieve relevant contract clauses, case precedents, or regulatory text based on conceptual similarity to a legal query.
Product search and recommendation
Embed product descriptions and images. Enable "find products similar to this" or match natural-language search queries to a product catalogue without keyword overlap.
Recommendation systems
Embed user preferences, content items, or collaborative signals as vectors. Find and recommend items that are semantically similar to what a user has engaged with.
Training content search
Index course materials, assessments, and learning resources. Match learner queries to the most relevant training content for personalised learning paths.
Vector Database Challenges
Poor embedding quality
Retrieval is only as good as the embeddings. A generic embedding model may fail on specialised domain vocabulary (legal, clinical, engineering). The wrong embedding model is the most common root cause of poor RAG retrieval.
Bad chunking strategy
Chunks that are too large, too small, or split across natural boundaries produce embeddings that do not represent coherent ideas — leading to irrelevant or partially-relevant retrievals.
Irrelevant retrieval
The top-K retrieved chunks may be semantically close to the query but not factually relevant to the question. This is a precision problem that reranking, metadata filtering, and better chunking can address.
Duplicate and redundant content
Indexing the same document multiple times, or indexing documents with significant boilerplate repetition, causes duplicated retrievals and wasted context window space.
Latency at scale
Vector search is fast but not free. At very large index sizes (hundreds of millions of vectors) and high query volumes, latency management, index sharding, and hardware optimisation become engineering concerns.
Cost
Managed vector databases charge for storage and query volume. Embedding model API calls add cost during both indexing and query time. At enterprise scale, these costs are material and need to be budgeted.
Security and access control
In multi-tenant applications, different users must only retrieve documents they are authorised to see. Access control must be enforced at the retrieval layer — not just the application layer — using namespace isolation or metadata-based filtering.
Evaluation difficulty
Unlike traditional database queries, it is non-trivial to know if your retrieval is working correctly. Evaluation requires labelled query-document pairs and metrics like RAGAS (faithfulness, context precision, answer relevancy) — which require deliberate effort to build and maintain.
Best Practices for Vector Database Projects
Clean source data before indexing
Remove headers, footers, boilerplate, and duplicate content before chunking. Garbage in = garbage retrieved.
Design chunks around retrieval, not ingestion
Chunk size decisions should be validated by retrieval quality — not just by what is easy to ingest. Test retrieval on representative queries before scaling.
Use metadata filters from day one
Add source, date, category, and access-level metadata to every chunk. Retrofitting metadata into an existing index is expensive. Design the metadata schema before ingesting.
Implement reranking
A two-stage retrieval pipeline — broad vector search for top-20, then a cross-encoder reranker to select the best 3–5 — significantly improves precision with modest added latency.
Evaluate retrieval with RAGAS
Set up a RAGAS evaluation suite before deploying. Measure context precision, context recall, faithfulness, and answer relevancy on a labelled test set. Run it on every code change that touches chunking or retrieval.
Validate outputs with human review
Automated metrics are necessary but not sufficient. Schedule regular human review of retrieved chunks and LLM answers — especially for high-stakes applications.
Enforce access control at the retrieval layer
Use vector database namespacing, metadata filters, or application-level pre-filtering to ensure users can only retrieve documents they are authorised to access.
Monitor retrieval latency and cost in production
Track p50 and p99 retrieval latency, embedding API call volumes, and database query cost per request. These metrics reveal scaling issues before they affect users.
The Production AI Engineering training covers production RAG architecture, retrieval evaluation, RAGAS, and vector database deployment patterns in depth — with hands-on project work.
Skills AI Engineers Need for Vector Databases
Embeddings fundamentals
What embeddings are, how models differ, token limits, dimensions, and how to choose the right model for a domain.
Chunking strategy
Fixed-size chunking, recursive character splitting, semantic chunking, overlap configuration, and how to evaluate chunk quality.
Vector database operations
Index creation, upsert, query, metadata filtering, namespace management, and index maintenance for the main tools (Pinecone, Chroma, pgvector, Weaviate).
RAG pipeline construction
Document loader → text splitter → embedding model → vector store → retriever → prompt template → LLM — using LangChain or equivalent.
Retrieval evaluation
RAGAS metrics (faithfulness, context precision, context recall, answer relevancy), labelled test set construction, and iterative improvement workflow.
API and SDK integration
Working with OpenAI, Cohere, or HuggingFace embedding APIs; vector database Python SDKs; async request handling and rate limit management.
Reranking
Cross-encoder reranking (Cohere Rerank, FlashRank), two-stage retrieval pipeline design, and precision vs. latency trade-offs.
Deployment and monitoring
Wrapping retrieval pipelines in FastAPI, Dockerising, deploying to cloud, monitoring latency and cost, and handling index updates on document changes.
For the complete AI engineer skill map covering all competencies from Python through RAG, agents, MCP, and deployment, see the AI Engineer Skills guide.
Vector Database Project Ideas for Your Portfolio
PDF Q&A assistant
Beginner–IntermediateUpload a set of PDFs, chunk and embed with LangChain, store in Chroma or Pinecone, build a retrieval chain, expose via FastAPI, and evaluate with RAGAS. The canonical RAG portfolio project.
Company knowledge assistant
IntermediateIndex your organisation's Confluence, Notion, or internal docs. Add metadata filtering (team, date, access level). Build a hybrid search system. Deploy with authentication. Go beyond the tutorial pattern.
Semantic search application
IntermediateBuild a web UI that allows users to search a large document collection by meaning — returning ranked results with source context, not just links. Demonstrates retrieval quality over a realistic corpus.
Product recommendation assistant
IntermediateEmbed product descriptions from a catalogue. Build a conversational interface where users describe what they are looking for in natural language and receive semantically matched product suggestions.
Support knowledge bot
IntermediateIndex a product documentation set and historical support ticket resolutions. Build a support assistant that retrieves the most relevant context before generating a response — with a citation of the source document.
Clinical document assistant
AdvancedIndex a set of medical or pharmaceutical documents with access-control metadata. Build a search and summarisation tool with RAGAS evaluation. Demonstrates production-level thinking about data security and evaluation.
For full project specifications with architecture guidance, evaluation requirements, and deployment steps, see the AI Engineer Projects guide.
Recommended Technovids Learning Path
| Goal | Recommended Resource |
|---|---|
| Understand the AI engineering discipline vector databases belong to | AI Engineering Guide → |
| Learn how RAG systems use vector databases | What is RAG? Guide → |
| Decide between RAG and fine-tuning for your use case | RAG vs Fine-Tuning Guide → |
| Understand LangChain's role in connecting vector stores to LLMs | What is LangChain? Guide → |
| Build the full technical skill set for RAG and retrieval | AI Engineer Skills Guide → |
| Build a deployable vector database portfolio project | AI Engineer Projects Guide → |
| Join structured live training building RAG systems | AI Engineering Course → |
| Train your team on production RAG and vector database systems | Production AI Engineering → |
| Explore all Technovids AI Engineering resources | AI Engineering Resource Library → |
Want to build RAG systems with vector databases?
Understanding vector databases conceptually is the foundation. Building production RAG pipelines with proper chunking, evaluation, reranking, and deployment — and doing it with live instructor feedback and code review — is what makes the difference. Technovids offers live AI engineering training covering the full RAG and vector database stack from prototype to production.
Frequently Asked Questions — What is a Vector Database?
What is a vector database?+
A vector database is a type of database that stores data as high-dimensional numerical vectors (embeddings) and enables fast similarity search — finding data by meaning or semantic closeness rather than exact keyword or ID matches. Unlike traditional databases that store and query structured rows, vector databases are optimised for the "find the most similar N items to this query" operation. They are the core retrieval layer in many RAG systems, semantic search applications, recommendation engines, and AI-powered document assistants.
Why are vector databases used in AI?+
Vector databases are used in AI because large language models and AI applications frequently need to search large collections of unstructured data — documents, images, code, product descriptions — by meaning rather than by exact word match. A keyword search for "parental leave" will not find a document section titled "family care policy". A vector search will, because both phrases are semantically similar in the embedding space. This semantic retrieval capability is essential for RAG systems, document Q&A assistants, recommendation systems, and any AI application that needs to ground its answers in relevant context.
What are embeddings?+
Embeddings are numerical representations of text (or other data) as vectors — lists of floating-point numbers, typically 768 to 3,072 dimensions depending on the model. An embedding model (such as OpenAI text-embedding-3-small, Cohere embed-v3, or a HuggingFace sentence transformer) converts a piece of text into a vector such that semantically similar texts produce vectors that are geometrically close to each other in high-dimensional space. For example, "How do I reset my password?" and "I forgot my login credentials" would produce very similar vectors even though they share no words — because they mean the same thing.
How does vector search work?+
Vector search works by: (1) converting your content into embeddings using an embedding model and storing those vectors in a vector database; (2) when a user submits a query, converting that query into an embedding using the same model; (3) computing the geometric distance or similarity between the query vector and all stored vectors; (4) returning the N vectors (and their associated content) that are most similar — i.e., closest in the embedding space — to the query. The similarity calculation uses a distance metric such as cosine similarity, dot product, or Euclidean distance. Most vector databases use Approximate Nearest Neighbour (ANN) algorithms like HNSW to make this search fast even across millions of vectors.
Is a vector database required for RAG?+
Not strictly — RAG can be implemented without a dedicated vector database by storing embeddings in flat files, NumPy arrays, or a relational database. However, vector databases significantly improve retrieval speed, quality, and manageability at any non-trivial scale. For prototypes with a handful of documents, FAISS (in-memory) or Chroma (local) are common choices that technically are vector stores rather than full databases. For production systems serving enterprise document volumes, a proper vector database (Pinecone, Weaviate, Qdrant, pgvector) is the practical standard.
What is the difference between vector search and keyword search?+
Keyword search (like a SQL LIKE query or Elasticsearch BM25) finds documents that contain the exact words in the query. It matches tokens. Vector search finds documents that are semantically similar to the query — meaning they express similar ideas even with different words. Keyword search is precise when the exact term is known; vector search handles synonyms, paraphrasing, and conceptual similarity. Hybrid search combines both: it runs a vector search and a keyword search in parallel and merges the results, giving the precision of keyword matching plus the semantic breadth of vector search.
Which vector database is best for beginners?+
For beginners, Chroma is the most accessible starting point — it requires no infrastructure, installs as a Python package, persists locally, and integrates directly with LangChain and LlamaIndex. FAISS is another common beginner choice for in-memory use. For a first cloud-managed experience, Pinecone offers a free tier and a simple API. The best choice depends on your use case: local prototype → Chroma or FAISS; first production deployment → Pinecone or pgvector if you already use PostgreSQL.
What is pgvector?+
pgvector is a PostgreSQL extension that adds vector storage and similarity search capabilities to a PostgreSQL database. It allows teams to store embeddings as a column type alongside traditional relational data, and run similarity queries (cosine, L2, inner product) using SQL. pgvector is a strong choice for teams already running PostgreSQL who want to add RAG capabilities without introducing a new infrastructure component. It handles moderate scale well but is generally slower than dedicated vector databases like Pinecone or Qdrant at very large vector counts (tens of millions).
Is FAISS a vector database?+
FAISS (Facebook AI Similarity Search) is an open-source library for efficient similarity search and clustering of dense vectors — it is technically a vector search library, not a full database. It operates in-memory (or on disk via its index format), has no built-in persistence layer, API server, or metadata storage. Many teams use FAISS wrapped inside LangChain or a custom Python script for prototyping. For production deployments that need persistence, metadata filtering, API access, and scalability, a full vector database (Pinecone, Qdrant, Weaviate, pgvector) is the right choice.
Can vector databases store images and other non-text data?+
Yes. Vector databases store vectors — they do not care what those vectors represent. The same database that stores text embeddings can store image embeddings (from models like CLIP), audio embeddings, code embeddings, or product catalogue embeddings. Multimodal search applications use this property to find images semantically similar to a text query, or to find products similar to an uploaded photo. The embedding model determines what is encoded; the vector database stores and retrieves the vectors regardless of their origin.
What skills are needed to work with vector databases as an AI engineer?+
To work with vector databases as an AI engineer you need: (1) Python — to use embedding model libraries and vector database SDKs; (2) understanding of embeddings — what they are, how models differ, dimensions and token limits; (3) chunking strategy — how to split documents before embedding for optimal retrieval quality; (4) vector database fundamentals — indexing, metadata filtering, namespace design, similarity metrics; (5) LangChain or LlamaIndex — frameworks that abstract vector store operations; (6) retrieval evaluation — RAGAS, precision@K, faithfulness metrics; (7) production deployment concepts — latency, cost, access control. See the AI Engineer Skills guide for the full skill map.
Which Technovids resource should I read next?+
If you are new to how vector databases are used in AI applications, read the complete What is RAG? guide — vector databases are the retrieval engine of most RAG systems. If you want to understand when to use RAG versus fine-tuning, see the RAG vs Fine-Tuning comparison. For the full AI engineering skill map including retrieval and deployment, see the AI Engineer Skills guide. For live structured training building RAG systems and AI applications in production, explore the AI Engineering Course.