Adaptive Selection of Anchor Items for CUR-based k-NN search with Cross-Encoders

Cross-encoder models, which jointly encode and score a query-item pair, are typically prohibitively expensive for k-nearest neighbor search. Consequently, k-NN search is performed not with a cross-encoder, but with a heuristic retrieve (e.g., using BM25 or dual-encoder) and re-rank approach. Recent work proposes ANNCUR (Yadav et al., 2022) which uses CUR matrix factorization to produce an embedding space for efficient vector-based search that directly approximates the cross-encoder without the need for dual-encoders. ANNCUR defines this shared query-item embedding space by scoring the test query against anchor items which are sampled uniformly at random. While this minimizes average approximation error over all items, unsuitably high approximation error on top-k items remains and leads to poor recall of top-k (and especially top-1) items. Increasing the number of anchor items is a straightforward way of improving the approximation error and hence k-NN recall of ANNCUR but at the cost of increased inference latency. In this paper, we propose a new method for adaptively choosing anchor items that minimizes the approximation error for the practically important top-k neighbors for a query with minimal computational overhead. Our proposed method incrementally selects a suitable set of anchor items for a given test query over several rounds, using anchors chosen in previous rounds to inform selection of more anchor items. Empirically, our method consistently improves k-NN recall as compared to both ANNCUR and the widely-used dual-encoder-based retrieve-and-rerank approach.

[1]  M. Zaheer,et al.  Efficient Nearest Neighbor Search for Cross-Encoder Models using Matrix Factorization , 2022, EMNLP.

[2]  Jane A. Yu,et al.  Few-shot Learning with Retrieval Augmented Language Models , 2022, J. Mach. Learn. Res..

[3]  A. McCallum,et al.  Sublinear Time Approximation of Text Similarity Matrices , 2021, AAAI Conference on Artificial Intelligence.

[4]  M. Zaharia,et al.  ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction , 2021, NAACL.

[5]  Weijie Zhao,et al.  Fast Neural Ranking on Bipartite Graph Indices , 2021, Proc. VLDB Endow..

[6]  Emine Yilmaz,et al.  Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations , 2021, ICLR.

[7]  Trevor Hastie,et al.  Weighted Low Rank Matrix Approximation and Acceleration , 2021, ArXiv.

[8]  Govinda M. Kamath,et al.  Bandit-Based Monte Carlo Optimization for Nearest Neighbors , 2021, IEEE Journal on Selected Areas in Information Theory.

[9]  Karl Stratos,et al.  Understanding Hard Negatives in Noise Contrastive Estimation , 2021, NAACL.

[10]  Ruofei Zhang,et al.  TwinBERT: Distilling Knowledge to Twin-Structured Compressed BERT Models for Large-Scale Retrieval , 2020, CIKM.

[11]  Hua Wu,et al.  RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering , 2020, NAACL.

[12]  Allan Hanbury,et al.  Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation , 2020, ArXiv.

[13]  Ming-Wei Chang,et al.  Retrieval Augmented Language Model Pre-Training , 2020, ICML.

[14]  Jacob Eisenstein,et al.  Sparse, Dense, and Attentional Representations for Text Retrieval , 2020, Transactions of the Association for Computational Linguistics.

[15]  M. Zaharia,et al.  ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT , 2020, SIGIR.

[16]  Danqi Chen,et al.  Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.

[17]  Shulong Tan,et al.  Fast Item Ranking under Neural Network based Measures , 2020, WSDM.

[18]  Luke Zettlemoyer,et al.  Zero-shot Entity Linking with Dense Entity Retrieval , 2019, ArXiv.

[19]  Eric Nyberg,et al.  Pruning Algorithms for Low-Dimensional Non-metric k-NN Search: A Case Study , 2019, SISAP.

[20]  Eric Nyberg,et al.  Accurate and Fast Retrieval for Complex Non-metric Data via Neighborhood Graphs , 2019, SISAP.

[21]  Sanjiv Kumar,et al.  Accelerating Large-Scale Inference with Anisotropic Vector Quantization , 2019, ICML.

[22]  Ming-Wei Chang,et al.  Zero-Shot Entity Linking by Reading Entity Descriptions , 2019, ACL.

[23]  David P. Woodruff,et al.  Sample-Optimal Low-Rank Approximation of Distance Matrices , 2019, COLT.

[24]  Allan Hanbury,et al.  On the Effect of Low-Frequency Terms on Neural-IR Models , 2019, SIGIR.

[25]  J. Weston,et al.  Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring , 2019, ICLR.

[26]  W. Bruce Croft,et al.  From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing , 2018, CIKM.

[27]  David P. Woodruff,et al.  Sublinear Time Low-Rank Approximation of Distance Matrices , 2018, NeurIPS.

[28]  Anastasios Kyrillidis,et al.  Simple and practical algorithms for $\ell_p$-norm low-rank approximation , 2018, 1805.09464.

[29]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[30]  Nicolas Gillis,et al.  Low-Rank Matrix Approximation in the Infinity Norm , 2017, Linear Algebra and its Applications.

[31]  David P. Woodruff,et al.  Sublinear Time Low-Rank Approximation of Positive Semidefinite Matrices , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[32]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[33]  Inderjit S. Dhillon,et al.  A Greedy Approach for Budgeted Maximum Inner Product Search , 2016, NIPS.

[34]  Volkan Cevher,et al.  Technical Report No . 201 701 January 201 7 RANDOMIZED SINGLE-VIEW ALGORITHMS FOR LOW-RANK MATRIX APPROXIMATION , 2016 .

[35]  David P. Woodruff,et al.  Weighted low rank approximations with provable guarantees , 2016, STOC.

[36]  Francesco Ricci,et al.  A survey of active learning in collaborative filtering recommender systems , 2016, Comput. Sci. Rev..

[37]  Seungjin Choi,et al.  Weighted nonnegative matrix factorization , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[38]  Petros Drineas,et al.  CUR matrix decompositions for improved data analysis , 2009, Proceedings of the National Academy of Sciences.

[39]  Lawrence Cayton,et al.  Fast nearest neighbor retrieval for bregman divergences , 2008, ICML '08.

[40]  Hanan Samet,et al.  Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.

[41]  Tommi S. Jaakkola,et al.  Weighted Low-Rank Approximations , 2003, ICML.

[42]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[43]  S. Goreinov,et al.  A Theory of Pseudoskeleton Approximations , 1997 .

[44]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[45]  David P. Woodruff,et al.  Algorithms for ℓp Low Rank Approximation , 2017 .

[46]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .