ProMIPS: Efficient High-Dimensional c-Approximate Maximum Inner Product Search with a Lightweight Index

Due to the wide applications in recommendation systems, multi-class label prediction and deep learning, the Maximum Inner Product (MIP) search problem has received extensive attention in recent years. Faced with large-scale datasets containing high-dimensional feature vectors, the state-of-the-art LSH-based methods usually require a large number of hash tables or long hash codes to ensure the searching quality, which takes up lots of index space and causes excessive disk page accesses. In this paper, we relax the guarantee of accuracy for efficiency and propose an efficient method for c-Approximate Maximum Inner Product (c-AMIP) search with a lightweight iDistance index. We project high-dimensional points to low-dimensional ones via 2-stable random projections and derive probability-guaranteed searching conditions, by which the c-AMIP results can be guaranteed in accuracy with arbitrary probabilities. To further improve the efficiency, we propose Quick-Probe for quickly determining the searching bound satisfying the derived condition in advance, avoiding the inefficient incremental searching process. Extensive experimental evaluations on four real datasets demonstrate that our method requires less pre-processing cost including index size and pre-processing time. In addition, compared to the state-of-the-art benchmark methods, it provides superior results on searching quality in terms of overall ratio and recall, and efficiency in terms of page access and running time.

[1]  Rui Liu,et al.  A Bandit Approach to Maximum Inner Product Search , 2018 .

[2]  Anthony K. H. Tung,et al.  Accurate and Fast Asymmetric Locality-Sensitive Hashing Scheme for Maximum Inner Product Search , 2018, KDD.

[3]  Rainer Gemulla,et al.  Exact and Approximate Maximum Inner Product Search with LEMP , 2016, ACM Trans. Database Syst..

[4]  Hui Li,et al.  FEXIPRO: Fast and Exact Inner Product Retrieval in Recommender Systems , 2017, SIGMOD Conference.

[5]  Nathan Srebro,et al.  On Symmetric and Asymmetric LSHs for Inner Product Search , 2014, ICML.

[6]  Inderjit S. Dhillon,et al.  A Greedy Approach for Budgeted Maximum Inner Product Search , 2016, NIPS.

[7]  Yehuda Koren,et al.  The Yahoo! Music Dataset and KDD-Cup '11 , 2012, KDD Cup.

[8]  Sanjiv Kumar,et al.  Local Orthogonal Decomposition for Maximum Inner Product Search , 2019, ArXiv.

[9]  James Bennett,et al.  The Netflix Prize , 2007 .

[10]  Xuemin Lin,et al.  Approximate Nearest Neighbor Search on High Dimensional Data — Experiments, Analyses, and Improvement , 2016, IEEE Transactions on Knowledge and Data Engineering.

[11]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[12]  Jonathon Shlens,et al.  Deep Networks With Large Output Spaces , 2014, ICLR.

[13]  Artem Babenko,et al.  Non-metric Similarity Graphs for Maximum Inner Product Search , 2018, NeurIPS.

[14]  Parikshit Ram,et al.  Maximum inner-product search using cone trees , 2012, KDD.

[15]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[16]  Ole Winther,et al.  Indexable Probabilistic Matrix Factorization for Maximum Inner Product Search , 2016, AAAI.

[17]  Ping Li,et al.  Improved Asymmetric Locality Sensitive Hashing (ALSH) for Maximum Inner Product Search (MIPS) , 2014, UAI.

[18]  Jinfeng Li,et al.  Norm-Ranging LSH for Maximum Inner Product Search , 2018, NeurIPS.

[19]  Ping Li,et al.  Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) , 2014, NIPS.

[20]  Parikshit Ram,et al.  Fast Exact Max-Kernel Search , 2012, SDM.

[21]  Ulrich Paquet,et al.  Speeding up the Xbox recommender system using a euclidean transformation for inner-product spaces , 2014, RecSys '14.

[22]  Yannis Avrithis,et al.  Locally Optimized Product Quantization for Approximate Nearest Neighbor Search , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Pascal Vincent,et al.  Clustering is Efficient for Approximate Maximum Inner Product Search , 2015, ArXiv.

[24]  David Simcha,et al.  New Loss Functions for Fast Maximum Inner Product Search , 2019, ArXiv.

[25]  Anshumali Shrivastava,et al.  Scalable and Sustainable Deep Learning via Randomized Hashing , 2016, KDD.

[26]  Jie Liu,et al.  A General and Efficient Querying Method for Learning to Hash , 2018, SIGMOD Conference.

[27]  Wei Liu,et al.  Learning Binary Codes for Maximum Inner Product Search , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Xuemin Lin,et al.  SRS: Solving c-Approximate Nearest Neighbor Queries in High Dimensional Euclidean Space with a Tiny Index , 2014, Proc. VLDB Endow..

[29]  Parikshit Ram,et al.  Efficient retrieval of recommendations in a matrix factorization framework , 2012, CIKM.

[30]  Guoliang Li,et al.  Approximate Query Processing: What is New and Where to Go? , 2018, Data Science and Engineering.

[31]  Sanjiv Kumar,et al.  Quantization based Fast Inner Product Search , 2015, AISTATS.

[32]  Roberto Turrin,et al.  Performance of recommender algorithms on top-n recommendation tasks , 2010, RecSys '10.

[33]  Cho-Jui Hsieh,et al.  A Fast Sampling Algorithm for Maximum Inner Product Search , 2019, AISTATS.

[34]  Rainer Gemulla,et al.  LEMP: Fast Retrieval of Large Entries in a Matrix Product , 2015, SIGMOD Conference.

[35]  Jonathon Shlens,et al.  Fast, Accurate Detection of 100,000 Object Classes on a Single Machine , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Ninh D. Pham,et al.  Revisiting Wedge Sampling for Budgeted Maximum Inner Product Search , 2019, ECML/PKDD.

[37]  Beng Chin Ooi,et al.  Making the pyramid technique robust to query types and workloads , 2004, Proceedings. 20th International Conference on Data Engineering.

[38]  Rui Liu,et al.  Robust Multi-Network Clustering via Joint Cross-Domain Cluster Alignment , 2015, 2015 IEEE International Conference on Data Mining.

[39]  Parikshit Ram,et al.  Dual‐tree fast exact max‐kernel search , 2014, Stat. Anal. Data Min..

[40]  Rina Panigrahy,et al.  Entropy based nearest neighbor search in high dimensions , 2005, SODA '06.

[41]  Wen Yang,et al.  PASE: PostgreSQL Ultra-High-Dimensional Approximate Nearest Neighbor Search Extension , 2020, SIGMOD Conference.

[42]  Fuzhen Zhang Matrix Theory: Basic Results and Techniques , 1999 .

[43]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[44]  Parikshit Ram,et al.  Improved maximum inner product search with better theoretical guarantees , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[45]  Beng Chin Ooi,et al.  iDistance: An adaptive B+-tree based indexing method for nearest neighbor search , 2005, TODS.

[46]  Ge Yu,et al.  BrePartition: Optimized High-Dimensional kNN Search with Bregman Distances , 2020, ArXiv.