Writing
4 posts and counting.
2026
- We don't need to compromise on freshness in retrieval
Why I'm excited about TopK's semantic_index: most retrieval systems buy quality by trading away freshness. Co-designing the model, inference engine, and database lets you keep both.
- Late interaction has no LIMIT
The LIMIT benchmark is held up as evidence that multi-vector models struggle. I tried to reproduce it — and found late interaction is embarrassingly good at exactly this task.
2025
- The future is sparse: compressing embeddings with CompresSAE
Embedding databases with hundreds of millions of vectors are costly to serve. CompresSAE uses a sparse autoencoder to cut the footprint 10×+ with only a small hit to retrieval quality.
2024
- Your turn, TikTok: the hidden reason Instagram's new algorithm could win short video
Instagram's ranking changes are marketed as a win for small creators. Underneath, they read like a scalability play for early-stage collaborative filtering.