A local authority's case management system had 15 years of records. Staff spent 45 minutes per case manually searching for relevant precedents. The keyword search was useless—"anti-social behaviour" didn't match records filed as "ASB", "nuisance", or "noise complaint". They'd tried synonyms, wildcards, and elaborate Boolean queries. Nothing worked.
We embedded their case archive and built a semantic search layer. Now the same search takes 3 seconds and returns results by meaning, not keywords. That's what embeddings do: they let machines understand language the way humans do.
improvement vs keyword search
reduction in lookup time
per 1M tokens (OpenAI)
Words closer together have similar meanings or relationships in the embedding space.
Notice how words in the same category (color) tend to cluster together.
An embedding converts text (or images, audio, etc.) into a list of numbers called a vector. These vectors typically have hundreds or thousands of dimensions—OpenAI's text-embedding-3-large uses 3,072 dimensions, while smaller models might use 384.
The key insight is that similar concepts end up close together in this high-dimensional space. "King" and "queen" will have similar vectors, while "king" and "bicycle" will be far apart. This spatial relationship enables powerful operations:
Once you have embeddings, you measure similarity using distance metrics. The most common is cosine similarity, which measures the angle between two vectors:
[#ff7b72]">class=[#ff7b72]">class="text-[#a5d6ff]">"text-[#8b949e]"># Cosine similarity ranges [#ff7b72]">from -[#79c0ff]">1 to [#79c0ff]">1[#ff7b72]">class=[#ff7b72]">class="text-[#a5d6ff]">"text-[#8b949e]"># [#79c0ff]">1 = identical meaning, [#79c0ff]">0 = unrelated, -[#79c0ff]">1 = opposite [#ff7b72]">from numpy [#ff7b72]">import dot[#ff7b72]">from numpy.[#79c0ff]">linalg [#ff7b72]">import norm def [#d2a8ff]">cosine_similarity(a, b): [#ff7b72]">return [#d2a8ff]">dot(a, b) / ([#d2a8ff]">norm(a) * [#d2a8ff]">norm(b)) [#ff7b72]">class=[#ff7b72]">class="text-[#a5d6ff]">"text-[#8b949e]"># Example [#79c0ff]">results:[#ff7b72]">class=[#ff7b72]">class="text-[#a5d6ff]">"text-[#8b949e]"># [#ff7b72]">class="text-[#a5d6ff]">"How do I [#ff7b72]">return an item?" vs [#ff7b72]">class="text-[#a5d6ff]">"What's your refund policy?" → [#79c0ff]">0.89[#ff7b72]">class=[#ff7b72]">class="text-[#a5d6ff]">"text-[#8b949e]"># [#ff7b72]">class="text-[#a5d6ff]">"How do I [#ff7b72]">return an item?" vs [#ff7b72]">class="text-[#a5d6ff]">"What's the weather today?" → [#79c0ff]">0.12Other metrics include Euclidean distance and dot product. Most vector databases support multiple distance functions, and the best choice depends on your embedding model and use case.
The embedding model you choose significantly impacts quality, cost, and latency. Here are the main options:
| Option | When to Use | Trade-offs |
|---|---|---|
| OpenAI / Cohere | General text, quick setup, don't want to manage infrastructure | Per-token costs, data leaves your infrastructure, generic performance on domain text |
| Voyage AI (Domain) | Legal, medical, financial, or code-heavy content | Higher per-token cost, but significantly better retrieval for specialised content |
| Open-Source (BGE, E5) | Data privacy requirements, cost sensitivity at scale, need customisation | Requires GPU infrastructure, model management overhead, need ML expertise |
General-purpose embeddings like OpenAI's are trained on broad internet text. They work well for general content but struggle with domain-specific terminology:
For high-stakes retrieval in specialised domains, consider domain-specific models (Voyage AI offers legal and medical variants) or fine-tuning open-source models on your corpus. The retrieval quality difference can be dramatic—we've seen 25-40% improvements in recall when switching from general-purpose to domain-specific embeddings.
Embeddings need specialised storage for efficient similarity search at scale. Traditional databases can't efficiently query "find the 10 most similar vectors to this one" across millions of records. Vector databases solve this with approximate nearest neighbour (ANN) algorithms.
Selection criteria: Consider scale requirements, hosting preferences, query complexity (filtering, hybrid search), and your existing infrastructure.
Traditional keyword search fails when users describe what they want differently than how it's documented. Embeddings understand that "how to cancel my subscription" should match an article titled "Account Termination Process".
Product recommendations become more nuanced. Instead of "customers also bought", you can find products with similar descriptions, use cases, or customer reviews—even for new products with no purchase history.
Embeddings power the retrieval step in RAG systems. When a user asks a question, embeddings find the most relevant documents from your knowledge base to provide context to the LLM.
Embed transaction descriptions, user behaviours, or support tickets. Anomalies appear as vectors far from normal patterns, enabling early detection of fraud, unusual activity, or emerging issues.
Benchmark embedding models on your actual data
Generic benchmarks don't predict domain-specific performance
Plan your chunking strategy around document structure
Fixed-size chunks split sentences and lose context
Store metadata alongside vectors for filtering
Without metadata, you can't filter by date, type, or permissions
Set up a re-embedding pipeline for model updates
Changing models requires re-embedding everything
Consider data privacy requirements early
API-based embeddings send data to third parties
Embeddings are the foundation of modern semantic AI. They're what let machines understand that "I want to cancel my account" and "how do I close my subscription" are asking the same thing—something keyword search could never do.
For most teams, the path is clear: start with a hosted API to prove the concept, benchmark against your actual queries, then decide whether domain-specific or open-source models are worth the additional complexity. The ecosystem is mature enough that embeddings should be a standard part of any AI-powered application.
Building semantic search for sensitive or domain-specific data? We help organisations choose the right embedding strategy and deploy search systems that work.
Learn about our method →