Norm Adjusted Proximity Graph for Fast Inner Product Retrieval

Efficient inner product search on embedding vectors is often the vital stage for online ranking services, such as recommendation and information retrieval. Recommendation algorithms, e.g., matrix factorization, typically produce latent vectors to represent users or items. The recommendation services are conducted by retrieving the most relevant item vectors given the user vector, where the relevance is often defined by inner product. Therefore, developing efficient recommender systems often requires solving the so-called maximum inner product search (MIPS) problem. In the past decade, there have been many studies on efficient MIPS algorithms. This task is challenging in part because the inner product does not follow the triangle inequality of metric space. Compared with hash-based or quantization-based MIPS solutions, in recent years graph-based MIPS algorithms have demonstrated their strong empirical advantages in many real-world MIPS tasks. In this paper, we propose a new index graph construction method named norm adjusted proximity graph (NAPG), for efficient MIPS. With adjusting factors estimated on sampled data, NAPG is able to select more meaningful data points to connect with when constructing graph-based index for inner product search. Our extensive experiments on a variety of datasets verify that the improved graph-based index strategy provides another strong addition to the pool of efficient MIPS algorithms.

[1]  David Novak,et al.  Off the Beaten Path: Let's Replace Term-Based Retrieval with k-NN Search , 2016, CIKM.

[2]  Forest Baskett,et al.  An Algorithm for Finding Nearest Neighbors , 1975, IEEE Transactions on Computers.

[3]  Artem Babenko,et al.  Non-metric Similarity Graphs for Maximum Inner Product Search , 2018, NeurIPS.

[4]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[5]  Ata Kabán,et al.  Improved Bounds on the Dot Product under Random Projection and Random Sign Projection , 2015, KDD.

[6]  Inderjit S. Dhillon,et al.  Large-scale Multi-label Learning with Missing Labels , 2013, ICML.

[7]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[8]  Shujian Huang,et al.  Deep Matrix Factorization Models for Recommender Systems , 2017, IJCAI.

[9]  Peter Bailis,et al.  To Index or Not to Index: Optimizing Exact Maximum Inner Product Search , 2017, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[10]  Hang Li,et al.  Deep Learning for Matching in Search and Recommendation , 2018, SIGIR.

[11]  Sanjiv Kumar,et al.  Quantization based Fast Inner Product Search , 2015, AISTATS.

[12]  Jianfeng Gao,et al.  Learning Continuous Phrase Representations for Translation Modeling , 2014, ACL.

[13]  Noah Constant,et al.  ReQA: An Evaluation for End-to-End Answer Retrieval Models , 2019, EMNLP.

[14]  Jun Hu,et al.  Collaborative Multi-objective Ranking , 2018, CIKM.

[15]  Rainer Gemulla,et al.  LEMP: Fast Retrieval of Large Entries in a Matrix Product , 2015, SIGMOD Conference.

[16]  Jinfeng Li,et al.  Norm-Ranging LSH for Maximum Inner Product Search , 2018, NeurIPS.

[17]  Shulong Tan,et al.  Fast Item Ranking under Neural Network based Measures , 2020, WSDM.

[18]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[19]  Ping Li,et al.  Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) , 2014, NIPS.

[20]  Ping Li,et al.  Asymmetric Minwise Hashing for Indexing Binary Inner Products and Set Containment , 2015, WWW.

[21]  Jianfeng Gao,et al.  Reasoning in Vector Space: An Exploratory Study of Question Answering , 2015, ICLR.

[22]  Ulrich Paquet,et al.  Speeding up the Xbox recommender system using a euclidean transformation for inner-product spaces , 2014, RecSys '14.

[23]  Ping Li,et al.  Möbius Transformation for Fast Inner Product Search on Graph , 2019, NeurIPS.

[24]  Franz Aurenhammer,et al.  Voronoi diagrams—a survey of a fundamental geometric data structure , 1991, CSUR.

[25]  Deng Cai,et al.  Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph , 2017, Proc. VLDB Endow..

[26]  Paul Covington,et al.  Deep Neural Networks for YouTube Recommendations , 2016, RecSys.

[27]  Wei-Cheng Chang,et al.  Pre-training Tasks for Embedding-based Large-scale Retrieval , 2020, ICLR.

[28]  Jun Hu,et al.  Collaborative Filtering via Additive Ordinal Regression , 2018, WSDM.

[29]  Parikshit Ram,et al.  Maximum inner-product search using cone trees , 2012, KDD.

[30]  Inderjit S. Dhillon,et al.  A Greedy Approach for Budgeted Maximum Inner Product Search , 2016, NIPS.

[31]  Yehuda Koren,et al.  Collaborative filtering with temporal dynamics , 2009, KDD.

[32]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[33]  Yukihiro Tagami,et al.  AnnexML: Approximate Nearest Neighbor Search for Extreme Multi-label Classification , 2017, KDD.

[34]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[35]  Ping Li,et al.  MOBIUS: Towards the Next Generation of Query-Ad Matching in Baidu's Sponsored Search , 2019, KDD.

[36]  Ali Farhadi,et al.  Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index , 2019, ACL.

[37]  Jason Weston,et al.  Large scale image annotation: learning to rank with joint word-image embeddings , 2010, Machine Learning.

[38]  Manik Varma,et al.  Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications , 2016, KDD.

[39]  Ping Li,et al.  On Efficient Retrieval of Top Similarity Vectors , 2019, EMNLP.

[40]  Parikshit Ram,et al.  Improved maximum inner product search with better theoretical guarantee using randomized partition trees , 2018, Machine Learning.

[41]  Ping Li,et al.  SONG: Approximate Nearest Neighbor Search on GPU , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[42]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[43]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[44]  Yury A. Malkov,et al.  Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Rob Hall,et al.  Fast and Accurate Maximum Inner Product Recommendations on Map-Reduce , 2015, WWW.

[46]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[47]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[48]  Ping Li,et al.  Improved Asymmetric Locality Sensitive Hashing (ALSH) for Maximum Inner Product Search (MIPS) , 2014, UAI.

[49]  Ping Li,et al.  Fast Near Neighbor Search in High-Dimensional Binary Data , 2012, ECML/PKDD.

[50]  Anthony K. H. Tung,et al.  Accurate and Fast Asymmetric Locality-Sensitive Hashing Scheme for Maximum Inner Product Search , 2018, KDD.

[51]  Yehuda Koren,et al.  Advances in Collaborative Filtering , 2011, Recommender Systems Handbook.

[52]  Vladimir Krylov,et al.  Approximate nearest neighbor algorithm based on navigable small world graphs , 2014, Inf. Syst..

[53]  Hui Li,et al.  FEXIPRO: Fast and Exact Inner Product Retrieval in Recommender Systems , 2017, SIGMOD Conference.

[54]  Nathan Srebro,et al.  On Symmetric and Asymmetric LSHs for Inner Product Search , 2014, ICML.