About
I'm Martin Spišák. I do search research at TopK — you'll find a lot of this work in the TopK docs. Before that I worked on recommender systems at Recombee — across streaming platforms (video and audio) and news — and earlier on large-scale e-commerce at GLAMI.
My research interests revolve around the real-world problems people hit with retrieval — cost, scalability, and freshness. Retrieval is a beautifully general problem: it's the foundation underneath essentially every search and recommender system, so making it cheaper and faster pays off everywhere at once. I love working on the algorithms that get it there.
The thread running through most of my work is sparsity. Lately that's meant scaling late-interaction (ColBERT-style) retrieval with sparse multi-vector encoding, figuring out why some late-interaction models break under it and fixing them, and the long-running case that the future is sparse — compressing dense embeddings into high-dimensional sparse ones to cut memory without giving up quality. What I find most elegant is that these sparse vectors carry an index-like structure on the outside — so you can retrieve over them directly, with no separate index to build and rebuild.
Highlights
- Best Short Paper Runner-Up, RecSys 2023 for SANSA, a sparse, scalable reformulation of EASE. It found outsized real-world impact precisely because interaction graphs are so sparse — it stays extremely efficient on million-node graphs, and the algorithm still powers retrieval over billions of interactions every day across several domains.
- SMVE is the current state of the art for multi-vector (late-interaction) retrieval. Together with The Future is Sparse, it's gaining traction in the retrieval community — again, for the efficiency that sparsity buys.
- Top-tier publication — “Efficient Learning of Sparse Representations from Interactions” at WWW 2026 (The Web Conference, an A* venue).
- Sparse autoencoders in the newsroom — first-authored applied work using SAEs to surface segment-level insights for real-time editorial support across news media groups (INRA 2025).
- Program committee — Industry Track reviewer at UMAP 2025, RecSys 2025, and RecSys 2026.
- Evaluation tooling at Recombee — championed, designed, and built an internal tool for anytime-valid statistics (A/B testing), and led the initiative to improve evaluation tooling.
Selected publications
- Efficient Learning of Sparse Representations from Interactions — WWW 2026 (top-tier / A* venue)
- From Knots to Knobs: Towards Steerable Collaborative Filtering Using Sparse Autoencoders — 2026 preprint
- The Future is Sparse: Embedding Compression for Scalable Retrieval in Recommender Systems — RecSys 2025
- SAGEA: Sparse Autoencoder-based Group Embeddings Aggregation for Group Recommendations — RecSys 2025
- Segment-Aware Analytics for Real-Time Editorial Support in Media Groups — INRA 2025
- On Interpretability of Linear Autoencoders — RecSys 2024
- Scalable Approximate NonSymmetric Autoencoder for Collaborative Filtering — RecSys 2023 (Best Short Paper Runner-Up)
Full list on Google Scholar and DBLP.