Vector search underlies most retrieval-augmented generation (RAG) pipelines. On a larger scale, it becomes expensive. Storing 10 million document embeddings in Float32 consumes 31 GB of RAM. For dev teams running local or on-premise estimation, that number creates real hurdles.
A new open-source library called turbovac Addresses this directly. This is a vector indexer written in Rust with Python bindings. is built on turboquantA quantization algorithm from Google Research. The same 10-million-document corpus fits in 4 GB with TurboVac. On ARM hardware, search speed surpasses FAISS IndexPQFastScan by 12-20%.
turboquant paper
turboquant Was introduced by Google’s research team. The Google team proposes TurboQuant as a data-oblivious quantizer. This achieves almost optimal distortion rates across all bit-widths and dimensions. It requires zero training and zero passes over the data.
Most production-grade vector quantizers, including FAISS’s product quantization, require a codebook training step. You need to run k-means on a representative sample of your vectors before indexing begins. If your fund grows or changes, you may need to completely retrain and rebuild the index. TurboQuant leaves all that aside. It uses an analytical property of rotated vectors instead of data-dependent calibration.
How does TurboVac quantize a vector
The quantization pipeline has four steps: :
(1) Every vector is normalized. The length (parameter) is removed and stored as a single float. Each vector becomes a unit direction on the high-dimensional hypersphere.
(2) A random rotation is applied. All vectors are multiplied by the same random orthogonal matrix. After rotation, each coordinate independently follows the beta distribution. In higher dimensions, this Gaussian transforms to N(0, 1/d). This applies for any input data – rotation makes the coordinate distribution predictable.
(3) Lloyd-Max scalar quantization is applied. Because the distribution is known analytically, optimal bucket boundaries and centroids can be calculated from mathematics alone. For 2-bit quantization, this means 4 buckets per coordinate. For 4-bit, this means 16 buckets. No data pass required.
(4) Quantized coordinates are bit-packed in bytes. A 1536-dimensional vector shrinks from 6,144 bytes in FP32 to 384 bytes on 2-bit. This is a 16x compression ratio.
At search time, the query is rotated once within the same domain. Scoring occurs directly against codebook values. The scoring kernel uses SIMD intrinsics – NEON on ARM and AVX-512BW on modern x86, with AVX2 fallback – with nibble-split lookup tables for throughput.
TurboQuant achieves distortion to within about 2.7x of the information-theoretic Shannon lower bound.
Memory and Speed: Numbers
All benchmarks use 100K vectors, 1,000 queries, k=64, and report the mean of 5 runs.
To recall, TurboVac is compared to FAISS IndexPQ (LUT256, nbits=8, float32 LUT). This is a strong baseline: FAISS uses high-precision LUTs at scoring time and k-means++ for codebook training. Despite this, TurboQuant and FAISS are within 0-1 points on R@1 for the OpenAI embeddings at d=1536 and d=3072. Both converge to 1.0 recall by k=4–8. The glove is hard at d=200. On that dimension, TurboQuant lags 3-6 points behind FAISS on R@1, and closes at k≈16-32.
In terms of speed, the ARM results (Apple M3 Max) show TurboVac beating FAISS IndexPQFastScan by 12-20% in every configuration. On x86 (Intel Xeon Platinum 8481c/Sapphire Rapids, 8 vCPUs), TurboVac wins each 4-bit configuration by 1-6%. It runs within ~1% of FAISS on 2-bit single-threaded. Two configurations lag slightly behind FAISS: 2-bit multi-threaded at d=1536 and d=3072. There, the internal accumulation loop is too small for uncontrolled amortization. FAISS’s AVX-512 VBMI path has the edge in those two cases (2-4%).
python api
Installation is a single command: pip install turbovec. primary class is TurboQuantIndexInitialized with a dimension and bit width.
from turbovec import TurboQuantIndex
index = TurboQuantIndex(dim=1536, bit_width=4)
index.add(vectors)
scores, indices = index.search(query, k=10)
index.write("my_index.tq")
a second category, IdMapIndexSupports static external uint64 IDs that survive deletion. Removal by ID is O(1). This is useful for document repositories where vectors are frequently updated or deleted.
TurboVac integrates with Longchain (pip install turbovec(langchain)), lamindex (pip install turbovec(llama-index)), and haystack (pip install turbovec(haystack)). Available through Rust Crate cargo add turbovec.
MarketTechPost’s visual explainer
key takeaways
- No codebook training. TurboVoc indexes vectors instantly – no k-means, no reconstruction as the corpus grows.
- 16x compression. A 1536-dim Float32 vector shrinks from 6,144 bytes to 384 bytes at 2-bit quantization.
- Faster than FAISS on ARM. TurboVac beats FAISS IndexPQFastScan on ARM by 12-20% in every configuration.
- Near-optimal strain. TurboQuant achieves distortion to within ~2.7x of the Shannon lower bound – close to the theoretical limit.
- Completely local. No managed services, no data extraction – pairs with any open-source embedding model for an air-gapped RAG stack.
check it out repo here. Also, feel free to follow us Twitter And don’t forget to join us 150k+ ml subreddit and subscribe our newsletter. wait! Are you on Telegram? Now you can also connect with us on Telegram.
Do you need to partner with us to promote your GitHub repo or Hugging Face page or product release or webinar, etc? join us