Meet TurboVac: a Rust vector indexer with Python bindings, and built on Google's TurboQuant algorithm

Vector search underlies most retrieval-augmented generation (RAG) pipelines. On a larger scale, it becomes expensive. Storing 10 million document embeddings in Float32 consumes 31 GB of RAM. For dev teams running local or on-premise estimation, that number creates real hurdles.

A new open-source library called turbovac Addresses this directly. This is a vector indexer written in Rust with Python bindings. is built on turboquantA quantization algorithm from Google Research. The same 10-million-document corpus fits in 4 GB with TurboVac. On ARM hardware, search speed surpasses FAISS IndexPQFastScan by 12-20%.

turboquant paper

turboquant Was introduced by Google’s research team. The Google team proposes TurboQuant as a data-oblivious quantizer. This achieves almost optimal distortion rates across all bit-widths and dimensions. It requires zero training and zero passes over the data.

Most production-grade vector quantizers, including FAISS’s product quantization, require a codebook training step. You need to run k-means on a representative sample of your vectors before indexing begins. If your fund grows or changes, you may need to completely retrain and rebuild the index. TurboQuant leaves all that aside. It uses an analytical property of rotated vectors instead of data-dependent calibration.

How does TurboVac quantize a vector

The quantization pipeline has four steps: :

(1) Every vector is normalized. The length (parameter) is removed and stored as a single float. Each vector becomes a unit direction on the high-dimensional hypersphere.

(2) A random rotation is applied. All vectors are multiplied by the same random orthogonal matrix. After rotation, each coordinate independently follows the beta distribution. In higher dimensions, this Gaussian transforms to N(0, 1/d). This applies for any input data – rotation makes the coordinate distribution predictable.

(3) Lloyd-Max scalar quantization is applied. Because the distribution is known analytically, optimal bucket boundaries and centroids can be calculated from mathematics alone. For 2-bit quantization, this means 4 buckets per coordinate. For 4-bit, this means 16 buckets. No data pass required.

(4) Quantized coordinates are bit-packed in bytes. A 1536-dimensional vector shrinks from 6,144 bytes in FP32 to 384 bytes on 2-bit. This is a 16x compression ratio.

At search time, the query is rotated once within the same domain. Scoring occurs directly against codebook values. The scoring kernel uses SIMD intrinsics – NEON on ARM and AVX-512BW on modern x86, with AVX2 fallback – with nibble-split lookup tables for throughput.

TurboQuant achieves distortion to within about 2.7x of the information-theoretic Shannon lower bound.

Memory and Speed: Numbers

All benchmarks use 100K vectors, 1,000 queries, k=64, and report the mean of 5 runs.

To recall, TurboVac is compared to FAISS IndexPQ (LUT256, nbits=8, float32 LUT). This is a strong baseline: FAISS uses high-precision LUTs at scoring time and k-means++ for codebook training. Despite this, TurboQuant and FAISS are within 0-1 points on R@1 for the OpenAI embeddings at d=1536 and d=3072. Both converge to 1.0 recall by k=4–8. The glove is hard at d=200. On that dimension, TurboQuant lags 3-6 points behind FAISS on R@1, and closes at k≈16-32.

In terms of speed, the ARM results (Apple M3 Max) show TurboVac beating FAISS IndexPQFastScan by 12-20% in every configuration. On x86 (Intel Xeon Platinum 8481c/Sapphire Rapids, 8 vCPUs), TurboVac wins each 4-bit configuration by 1-6%. It runs within ~1% of FAISS on 2-bit single-threaded. Two configurations lag slightly behind FAISS: 2-bit multi-threaded at d=1536 and d=3072. There, the internal accumulation loop is too small for uncontrolled amortization. FAISS’s AVX-512 VBMI path has the edge in those two cases (2-4%).

python api

Installation is a single command: pip install turbovec. primary class is TurboQuantIndexInitialized with a dimension and bit width.

from turbovec import TurboQuantIndex

index = TurboQuantIndex(dim=1536, bit_width=4)
index.add(vectors)
scores, indices = index.search(query, k=10)
index.write("my_index.tq")

a second category, IdMapIndexSupports static external uint64 IDs that survive deletion. Removal by ID is O(1). This is useful for document repositories where vectors are frequently updated or deleted.

TurboVac integrates with Longchain (pip install turbovec(langchain)), lamindex (pip install turbovec(llama-index)), and haystack (pip install turbovec(haystack)). Available through Rust Crate cargo add turbovec.

MarketTechPost’s visual explainer

What is TurboVac?

TurboVoc is a vector indexer written in Rust with Python bindings. It is built on Google Research’s TurboQuant algorithm – a data-oblivious quantizer that requires zero codebook training. A 10 million document repository that occupies 31 GB as float32 fits into 4 GB with TurboVac.

✓ 16x compression at 2-bit

💨 Beats FAISS on ARM by 12-20%

🔒 Completely local – no data drain

📦 MIT licensed

installation

Install Python packages from PyPI with a single command. To rust, add the crate through the cargo.

# Python
pip install turbovec

# Rust
cargo add turbovec

Comment: To build from source, install mature run again mature build-release inside the turbovac-python/ Directory. run to war cargo build-release.

Basic Usage – TurboQuantIndex

turboquant index Is primary class. initialize it with a vector Foggy and a bit_width Out of 2 or 4. vectors are indexed immediately Add() – No training phase required.

from turbovec import TurboQuantIndex

index = TurboQuantIndex(dim=1536, bit_width=4)

# Add vectors (numpy float32 array, shape (n, dim))
index.add(vectors)
index.add(more_vectors)  # incremental adds are fine

# Search: returns top-k scores and positional indices
scores, indices = index.search(query, k=10)

Static ID – IdMapIndex

Use IdMapIndex When you need external uint64 The IDs that remain are removed. Removal by ID is O(1) – useful for document stores where vectors change over time.

import numpy as np
from turbovec import IdMapIndex

index = IdMapIndex(dim=1536, bit_width=4)

# Map vectors to your own uint64 external IDs
index.add_with_ids(vectors, np.array((1001, 1002, 1003), dtype=np.uint64))

# Search returns your external IDs, not positional indices
scores, ids = index.search(query, k=10)

# O(1) delete by external IDnindex.remove(1002)

Save and load an index

Both index types support persistent storage. turboquant index writes to .tq files. IdMapIndex writes to .tvim files.

from turbovec import TurboQuantIndex, IdMapIndex

# TurboQuantIndex  —>  .tq
index.write("my_index.tq")
loaded = TurboQuantIndex.load("my_index.tq")

# IdMapIndex  —>  .tvim
index.write("my_index.tvim")
loaded = IdMapIndex.load("my_index.tvim")

Framework Integration

TurboVac ships optional spares for Langchain, Laminedex, and Haystack. Install the addon that matches your stack.

# LangChain
pip install turbovec(langchain)

# LlamaIndex
pip install turbovec(llama-index)

# Haystack
pip install turbovec(haystack)

tip: Each integration plugs into TurboVac as a drop-in vector store. Look documents/integration/ In the repo for full usage examples with each framework.

Using TurboVac in Rust

The Rust API mirrors the Python API. Both turboquant index And IdMapIndex are available. All x86_64 targets build AVX2 as a baseline; AVX-512 is enabled at runtime through feature detection.

use turbovec::TurboQuantIndex;

let mut index = TurboQuantIndex::new(1536, 4);
index.add(&vectors);

let results = index.search(&queries, 10);

index.write("index.tv").unwrap();
let loaded = TurboQuantIndex::load("index.tv").unwrap();

📚 Full API: documents/api.md

⭐ github.com/RyanCodrai/turbovec

key takeaways

No codebook training. TurboVoc indexes vectors instantly – no k-means, no reconstruction as the corpus grows.
16x compression. A 1536-dim Float32 vector shrinks from 6,144 bytes to 384 bytes at 2-bit quantization.
Faster than FAISS on ARM. TurboVac beats FAISS IndexPQFastScan on ARM by 12-20% in every configuration.
Near-optimal strain. TurboQuant achieves distortion to within ~2.7x of the Shannon lower bound – close to the theoretical limit.
Completely local. No managed services, no data extraction – pairs with any open-source embedding model for an air-gapped RAG stack.

check it out repo here. Also, feel free to follow us Twitter And don’t forget to join us 150k+ ml subreddit and subscribe our newsletter. wait! Are you on Telegram? Now you can also connect with us on Telegram.

Do you need to partner with us to promote your GitHub repo or Hugging Face page or product release or webinar, etc? join us

Meet TurboVac: a Rust vector indexer with Python bindings, and built on Google’s TurboQuant algorithm

turboquant paper

How does TurboVac quantize a vector

Memory and Speed: Numbers

python api

MarketTechPost’s visual explainer

key takeaways

Your AI acts differently when it feels it’s being watched

Why is Mark Zuckerberg taunting his employees before firing them?

Related Articles

Leave a Comment Cancel Reply