Why Embeddings Matter

A local authority's case management system had 15 years of records. Staff spent 45 minutes per case manually searching for relevant precedents. The keyword search was useless—"anti-social behaviour" didn't match records filed as "ASB", "nuisance", or "noise complaint". They'd tried synonyms, wildcards, and elaborate Boolean queries. Nothing worked.

We embedded their case archive and built a semantic search layer. Now the same search takes 3 seconds and returns results by meaning, not keywords. That's what embeddings do: they let machines understand language the way humans do.

Key Takeaways

Embeddings convert text to vectors that capture semantic meaning, enabling search by concept rather than keywords

General-purpose embeddings struggle with domain-specific terminology—legal, medical, and technical contexts need specialised models

Start with a hosted API (OpenAI, Cohere), but plan for open-source if data privacy or cost becomes a concern

Search Success

improvement vs keyword search

90%

Time to Answer

reduction in lookup time

£8

Embedding Cost

per 1M tokens (OpenAI)

Hover over a word to see details

Words closer together have similar meanings or relationships in the embedding space.
Notice how words in the same category (color) tend to cluster together.

How Embeddings Work

An embedding converts text (or images, audio, etc.) into a list of numbers called a vector. These vectors typically have hundreds or thousands of dimensions—OpenAI's text-embedding-3-large uses 3,072 dimensions, while smaller models might use 384.

The key insight is that similar concepts end up close together in this high-dimensional space. "King" and "queen" will have similar vectors, while "king" and "bicycle" will be far apart. This spatial relationship enables powerful operations:

Semantic similarity: Compare how close two pieces of text are in meaning, not just keywords.
Clustering: Group similar documents, products, or users automatically.
Retrieval: Find relevant information even when exact keywords don't match.
Anomaly detection: Identify items that don't fit the expected patterns.

Measuring Similarity

Once you have embeddings, you measure similarity using distance metrics. The most common is cosine similarity, which measures the angle between two vectors:

cosine_similarity.py

[#ff7b72]">class=[#ff7b72]">class="text-[#a5d6ff]">"text-[#8b949e]"># Cosine similarity ranges [#ff7b72]">from -[#79c0ff]">1 to [#79c0ff]">1
[#ff7b72]">class=[#ff7b72]">class="text-[#a5d6ff]">"text-[#8b949e]"># [#79c0ff]">1 = identical meaning, [#79c0ff]">0 = unrelated, -[#79c0ff]">1 = opposite
 
[#ff7b72]">from numpy [#ff7b72]">import dot
[#ff7b72]">from numpy.[#79c0ff]">linalg [#ff7b72]">import norm
 
def [#d2a8ff]">cosine_similarity(a, b):
    [#ff7b72]">return [#d2a8ff]">dot(a, b) / ([#d2a8ff]">norm(a) * [#d2a8ff]">norm(b))
 
[#ff7b72]">class=[#ff7b72]">class="text-[#a5d6ff]">"text-[#8b949e]"># Example [#79c0ff]">results:
[#ff7b72]">class=[#ff7b72]">class="text-[#a5d6ff]">"text-[#8b949e]"># [#ff7b72]">class="text-[#a5d6ff]">"How do I [#ff7b72]">return an item?" vs [#ff7b72]">class="text-[#a5d6ff]">"What's your refund policy?" → [#79c0ff]">0.89
[#ff7b72]">class=[#ff7b72]">class="text-[#a5d6ff]">"text-[#8b949e]"># [#ff7b72]">class="text-[#a5d6ff]">"How do I [#ff7b72]">return an item?" vs [#ff7b72]">class="text-[#a5d6ff]">"What's the weather today?" → [#79c0ff]">0.12

Other metrics include Euclidean distance and dot product. Most vector databases support multiple distance functions, and the best choice depends on your embedding model and use case.

Choosing an Embedding Model

The embedding model you choose significantly impacts quality, cost, and latency. Here are the main options:

OpenAI Embeddings

text-embedding-3-large: Best quality, 3,072 dimensions. Good for high-stakes retrieval.
text-embedding-3-small: Good balance of quality and cost, 1,536 dimensions.
Pricing: ~$0.02-0.13 per million tokens depending on model.
Strengths: Easy API, good general performance, supports dimension reduction.

Cohere Embed

embed-english-v3.0: Optimised for English, strong retrieval performance.
embed-multilingual-v3.0: Supports 100+ languages with good cross-lingual retrieval.
Strengths: Excellent multilingual support, search-optimised variants.

Voyage AI

Domain-specific models for code, legal, finance, and healthcare.
Often outperforms general-purpose models on specialised content.
Strengths: Best-in-class for specific verticals.

Open-Source Options

sentence-transformers: Popular Python library with many pre-trained models.
BGE (BAAI): Strong open-source models, competitive with commercial options.
E5: Microsoft's embeddings, good multilingual performance.
Strengths: No API costs, data stays on-premises, customisable.

Embedding Model Selection

Option	When to Use	Trade-offs
OpenAI / Cohere	General text, quick setup, don't want to manage infrastructure	Per-token costs, data leaves your infrastructure, generic performance on domain text
Voyage AI (Domain)	Legal, medical, financial, or code-heavy content	Higher per-token cost, but significantly better retrieval for specialised content
Open-Source (BGE, E5)	Data privacy requirements, cost sensitivity at scale, need customisation	Requires GPU infrastructure, model management overhead, need ML expertise

The Domain Specificity Problem

General-purpose embeddings like OpenAI's are trained on broad internet text. They work well for general content but struggle with domain-specific terminology:

Legal: "Consideration" in contract law means something entirely different from everyday usage.
Medical: "Positive" test results are bad news, not good news.
Finance: "Short" has a specific meaning in trading contexts.
Technical: Programming terms, acronyms, and internal jargon won't be well-represented.

For high-stakes retrieval in specialised domains, consider domain-specific models (Voyage AI offers legal and medical variants) or fine-tuning open-source models on your corpus. The retrieval quality difference can be dramatic—we've seen 25-40% improvements in recall when switching from general-purpose to domain-specific embeddings.

From the Trenches

Before committing to a model, benchmark it on YOUR data. Create a test set of 50-100 queries with known relevant documents. Measure recall@10 (what percentage of relevant docs appear in the top 10 results). Generic embeddings often look good on benchmarks but fail on domain-specific content.

Vector Databases

Embeddings need specialised storage for efficient similarity search at scale. Traditional databases can't efficiently query "find the 10 most similar vectors to this one" across millions of records. Vector databases solve this with approximate nearest neighbour (ANN) algorithms.

Managed Services

Pinecone: Fully managed, excellent developer experience, scales automatically.
Weaviate Cloud: Supports hybrid search (vector + keyword), GraphQL API.
Qdrant Cloud: High performance, rich filtering capabilities.

Self-Hosted Options

Chroma: Developer-friendly, great for prototyping and smaller datasets.
Milvus: Enterprise-grade, handles billions of vectors.
pgvector: PostgreSQL extension—ideal if you already use Postgres.

Selection criteria: Consider scale requirements, hosting preferences, query complexity (filtering, hybrid search), and your existing infrastructure.

Practical Applications

Semantic Search

Traditional keyword search fails when users describe what they want differently than how it's documented. Embeddings understand that "how to cancel my subscription" should match an article titled "Account Termination Process".

Recommendation Systems

Product recommendations become more nuanced. Instead of "customers also bought", you can find products with similar descriptions, use cases, or customer reviews—even for new products with no purchase history.

Retrieval-Augmented Generation (RAG)

Embeddings power the retrieval step in RAG systems. When a user asks a question, embeddings find the most relevant documents from your knowledge base to provide context to the LLM.

Fraud and Anomaly Detection

Embed transaction descriptions, user behaviours, or support tickets. Anomalies appear as vectors far from normal patterns, enabling early detection of fraud, unusual activity, or emerging issues.

Embedding Implementation Checklist

Benchmark embedding models on your actual data

Watch out

Generic benchmarks don't predict domain-specific performance

Plan your chunking strategy around document structure

Watch out

Fixed-size chunks split sentences and lose context

Store metadata alongside vectors for filtering

Watch out

Without metadata, you can't filter by date, type, or permissions

Set up a re-embedding pipeline for model updates

Watch out

Changing models requires re-embedding everything

Consider data privacy requirements early

Watch out

API-based embeddings send data to third parties

Demystifying Transformer Models – Understand the architecture that generates embeddings
Understanding Generative Search (RAG) – How embeddings power retrieval-augmented generation
Understanding Prompt Caching – Reduce costs when combining embeddings with LLMs

Conclusion

Embeddings are the foundation of modern semantic AI. They're what let machines understand that "I want to cancel my account" and "how do I close my subscription" are asking the same thing—something keyword search could never do.

For most teams, the path is clear: start with a hosted API to prove the concept, benchmark against your actual queries, then decide whether domain-specific or open-source models are worth the additional complexity. The ecosystem is mature enough that embeddings should be a standard part of any AI-powered application.

Building semantic search for sensitive or domain-specific data? We help organisations choose the right embedding strategy and deploy search systems that work.

Learn about our method →

Last updated: December 2024

Why Embeddings Matter

Search Success

Time to Answer

Embedding Cost

Hover over a word to see details

How Embeddings Work

Measuring Similarity

Choosing an Embedding Model

OpenAI Embeddings

Cohere Embed

Voyage AI

Open-Source Options

The Domain Specificity Problem

Vector Databases

Managed Services

Self-Hosted Options

Practical Applications

Semantic Search

Recommendation Systems

Retrieval-Augmented Generation (RAG)

Fraud and Anomaly Detection

Related Articles

Conclusion

Why Embeddings Matter

Search Success

Time to Answer

Embedding Cost

Hover over a word to see details

How Embeddings Work

Measuring Similarity

Choosing an Embedding Model

OpenAI Embeddings

Cohere Embed

Voyage AI

Open-Source Options

The Domain Specificity Problem

Vector Databases

Managed Services

Self-Hosted Options

Practical Applications

Semantic Search

Recommendation Systems

Retrieval-Augmented Generation (RAG)

Fraud and Anomaly Detection

Related Articles

Conclusion