How to build a vector search from scratch in Python

by ai-intensify
0 comments

How to build a vector search from scratch in Python

# Introduction

You’ve probably typed a question into the search bar and received results that matched your words but completely missed your meaning. or saw a recommendation engine Bring up something extremely relevant, even if you never looked for it directly. The difference between “finding the exact word” and “understanding exactly what someone meant” is what makes a search feature. useful.

vector search Closes that gap by representing text as points in a high-dimensional space, where geometric proximity encodes semantic similarity. Two sentences can share zero words and still become neighbors because the model discovered that their meanings are close.

This article builds a vector search engine from scratch using only Python numpySo you can see exactly what happens at each step: how the embeddings are stored and normalized, why cosine similarity reduces to a dot product, and what the resulting search space actually looks like when you project it into two dimensions.

You can get the code on GitHub.

# What is vector search?

traditional keyword search Finds exact word matches. Vector search works differently: It converts documents and queries into numerical vectors called embeddings, then finds the vectors that are closest to each other in a high-dimensional space.

The main insight is that Proximity in vector space means semantic similarity. Two sentences that mean the same thing – even if they contain no words – will have embeddings close to each other.

The distance metric you use to measure “proximity” drives the entire system. is the most common cosine similarityWhich measures the angle between two vectors rather than their absolute distance. This makes it scale-invariant – useful when you care about direction or meaning rather than magnitude or word count.

# setting dataset

We’ll be working with a set of concise product descriptions from a hypothetical e-commerce catalog. These are pre-embedded as 8-dimensional vectors – a very low dimensionality that is realistic enough to demonstrate the concepts.

In a real system, you would generate these embeddings from a model sentence converter. For this tutorial, we simulate that step with controlled random data that has a clear cluster structure.

import numpy as np

np.random.seed(42)

# Product catalog — 3 semantic clusters: electronics, clothing, furniture
products = (
    "Wireless noise-cancelling headphones with 30-hour battery",
    "Bluetooth speaker with waterproof design",
    "USB-C hub with 7 ports and power delivery",
    "4K HDMI cable 6ft braided",
    "Mechanical keyboard with RGB backlight",
    "Men's slim-fit chino pants navy blue",
    "Women's merino wool turtleneck sweater",
    "Unisex running jacket lightweight windbreaker",
    "Leather chelsea boots for men",
    "Organic cotton crew neck t-shirt",
    "Solid oak dining table seats 6",
    "Ergonomic mesh office chair lumbar support",
    "Linen sofa 3-seater natural beige",
    "Bamboo bookshelf 5-tier adjustable",
    "Memory foam mattress queen size medium firm",
)

# Simulate embeddings with cluster structure
# Cluster centers in 8D space
electronics_center = np.array((0.9, 0.1, 0.2, 0.8, 0.1, 0.3, 0.7, 0.2))
clothing_center    = np.array((0.1, 0.8, 0.7, 0.1, 0.9, 0.2, 0.1, 0.8))
furniture_center   = np.array((0.2, 0.3, 0.9, 0.2, 0.1, 0.9, 0.3, 0.1))

n_per_cluster = 5
noise = 0.08

embeddings = np.vstack((
    electronics_center + np.random.randn(n_per_cluster, 8) * noise,
    clothing_center    + np.random.randn(n_per_cluster, 8) * noise,
    furniture_center   + np.random.randn(n_per_cluster, 8) * noise,
))

print(f"Embeddings shape: {embeddings.shape}")

Output:

Embeddings shape: (15, 8)

Each row is a product. Each column is a dimension of its embedding. Product names will not be used by search engines; Only the embedding matters.

How to build a vector search from scratch in Python
Image by author

# index creation

An “index” in a vector search engine is simply a stored set of normalized embeddings. Normalization is important here because it makes the cosine similarity equivalent to a dot product, which is cheaper to compute.

def normalize(vectors: np.ndarray) -> np.ndarray:
    """L2-normalize each row vector."""
    norms = np.linalg.norm(vectors, axis=1, keepdims=True)
    # Avoid division by zero
    norms = np.where(norms == 0, 1e-10, norms)
    return vectors / norms

class VectorIndex:
    def __init__(self):
        self.vectors = None
        self.labels = None

    def add(self, vectors: np.ndarray, labels: list):
        self.vectors = normalize(vectors)
        self.labels = labels
        print(f"Indexed {len(labels)} items with {vectors.shape(1)}-dimensional embeddings.")

    def search(self, query_vector: np.ndarray, top_k: int = 3):
        query_norm = normalize(query_vector.reshape(1, -1))
        # Cosine similarity = dot product of normalized vectors
        scores = self.vectors @ query_norm.T  # shape: (n_items, 1)
        scores = scores.flatten()
        # Get top-k indices sorted by descending score
        top_indices = np.argsort(scores)(::-1)(:top_k)
        return ((self.labels(i), float(scores(i))) for i in top_indices)

index = VectorIndex()
index.add(embeddings, products)

Output:

Indexed 15 items with 8-dimensional embeddings.

search The method does three things: normalizes the query, calculates dot products against each stored vector, then sorts by score and returns the top-k results. That matrix multiplication (self.vectors @ query_norm.T) is the entire recovery phase.

# running queries

Now let’s test what we’ve created with some questions. We construct query vectors by starting from one of the cluster centers and adding a little noise to simulate real query embeddings.

def make_query(center: np.ndarray, noise_scale: float = 0.05) -> np.ndarray:
    return center + np.random.randn(8) * noise_scale


queries = {
    "audio equipment": make_query(electronics_center),
    "casual wear":     make_query(clothing_center),
    "home furniture":  make_query(furniture_center),
}

for query_name, q_vec in queries.items():
    print(f"\nQuery: '{query_name}'")
    results = index.search(q_vec, top_k=3)
    for rank, (label, score) in enumerate(results, 1):
        print(f"  {rank}. ({score:.4f}) {label}")

Output:


Query: 'audio equipment'
  1. (0.9856) Wireless noise-cancelling headphones with 30-hour battery
  2. (0.9840) USB-C hub with 7 ports and power delivery
  3. (0.9829) Mechanical keyboard with RGB backlight

Query: 'casual wear'
  1. (0.9960) Men's slim-fit chino pants navy blue
  2. (0.9958) Leather chelsea boots for men
  3. (0.9916) Women's merino wool turtleneck sweater

Query: 'home furniture'
  1. (0.9929) Bamboo bookshelf 5-tier adjustable
  2. (0.9902) Linen sofa 3-seater natural beige
  3. (0.9881) Solid oak dining table seats 6

Scores close to 1.0 mean near-identical directionality in the embedding space, which is exactly what you would expect for queries constructed from the same cluster center as their target documents.

# Visualizing embedding space

It is difficult to reason visually about high-dimensional data. Principal Component Analysis (PCA) Projects the 8-dimensional embedding into 2D so we can see the cluster structure. We will implement a minimal PCA using only NumPy.

The following code computes a 2D PCA projection and plots all product embeddings with labels and cluster colors:

import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

projected = pca_2d(embeddings)

cluster_colors = (
    ("#4A90D9") * 5 +   # electronics — blue
    ("#E8734A") * 5 +   # clothing — orange
    ("#5BAD72") * 5     # furniture — green
)
cluster_labels = ("Electronics") * 5 + ("Clothing") * 5 + ("Furniture") * 5

fig, ax = plt.subplots(figsize=(6, 4))
ax.scatter(projected(:, 0), projected(:, 1),
           c=cluster_colors, s=100, edgecolors="white", linewidths=0.7, zorder=3)

This part projects the query vectors into one place, overlays them, and finalizes the plot:

# Plot query projections
q_projected = pca_2d(
    np.vstack(list(queries.values())) - embeddings.mean(axis=0)
)
for (qname, _), (qx, qy) in zip(queries.items(), q_projected):
    ax.scatter(qx, qy, marker="*", s=200, color="gold",
               edgecolors="#333", linewidths=0.6, zorder=4)
    ax.annotate(f"⟵ query: {qname}", (qx, qy),
                textcoords="offset points", xytext=(6, -8),
                fontsize=7, color="#555555", style="italic")

legend_patches = (
    mpatches.Patch(color="#4A90D9", label="Electronics"),
    mpatches.Patch(color="#E8734A", label="Clothing"),
    mpatches.Patch(color="#5BAD72", label="Furniture"),
    mpatches.Patch(color="gold",    label="Query vectors"),
)
ax.legend(handles=legend_patches, loc="upper left", fontsize=6)
ax.set_title("Vector Search — Embedding Space (PCA projection)", fontsize=10, pad=10)
ax.set_xlabel("PC 1"); ax.set_ylabel("PC 2")
ax.grid(True, linestyle="--", alpha=0.4)
plt.tight_layout()
plt.savefig("embedding_space_queries_only.png", dpi=150)
plt.show()

Output:

Vector Search - Embedding Space (PCA Projection)
Vector Search – Embedding Space (PCA Projection)

The clusters separate clearly. Each gold star (query vector) lands inside the cluster from which it was created. This is the geometry that vector search uses.

# Visualizing similarity score distribution

For any given query, it is useful to see how the similarity scores are distributed across the entire index – not just in the top-k. This tells you whether the top result is the clear winner or just slightly better than everything else.

q_vec_furniture = queries("home furniture")
q_norm_furniture = normalize(q_vec_furniture.reshape(1, -1))
all_scores_furniture = (index.vectors @ q_norm_furniture.T).flatten()

sorted_idx_furniture = np.argsort(all_scores_furniture)(::-1)
sorted_scores_furniture = all_scores_furniture(sorted_idx_furniture)
sorted_labels_furniture = (products(i)(:30) + "…" if len(products(i)) > 30
                           else products(i) for i in sorted_idx_furniture)

# Define bar colors: green for furniture items, gray for others
bar_colors_furniture = ()
for i in sorted_idx_furniture:
    if i >= 10 and i <= 14:  # Furniture items are originally at indices 10-14
        bar_colors_furniture.append("#5BAD72") # Green for furniture
    else:
        bar_colors_furniture.append("#cccccc") # Gray for others

fig, ax = plt.subplots(figsize=(10, 5))
bars = ax.barh(sorted_labels_furniture(::-1), sorted_scores_furniture(::-1),
               color=bar_colors_furniture(::-1), edgecolor="white", height=0.65)

ax.axvline(sorted_scores_furniture(2), color="#5BAD72", linestyle="--",
           linewidth=1.2, label="Top-3 cutoff")
ax.set_xlim(sorted_scores_furniture.min() - 0.002, 1.001)
ax.set_xlabel("Cosine Similarity Score")
ax.set_title("Query: 'home furniture' — Similarity Across All Products", fontsize=11, pad=12)
ax.legend(fontsize=8)
ax.grid(axis="x", linestyle="--", alpha=0.4)
plt.tight_layout()
plt.savefig("score_distribution_furniture.png", dpi=150)
plt.show()

Output:

Question: 'Household Furniture' - Similarity across all products
Question: ‘Household Furniture’ – Similarity across all products

There is a clear difference between the furniture cluster (top 5 bars) and everything else. In practice, you would use this difference to set a similarity threshold below which results are suppressed completely.

# wrapping up

You built a vector search engine with about 50 lines of NumPy: an index class that normalizes and stores the embeddings, a search method that uses matrix multiplication to calculate cosine similarity, and two visualizations that reveal the geometry behind the results.

The next step is to replace the fake embeddings with real embeddings. Try loading sentence-transformer and embedding your own text corpus. The index code here will work without any changes.

If you’d like to read more “from the beginning” articles, let us know what you’d like to see next!

Bala Priya C is a developer and technical writer from India. She likes to work in the fields of mathematics, programming, data science, and content creation. His areas of interest and expertise include DevOps, Data Science, and Natural Language Processing. She loves reading, writing, coding, and coffee! Currently, she is working on learning and sharing her knowledge with the developer community by writing tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.

Related Articles

Leave a Comment