Exact and Approximate Maximum Inner Product Search with LEMP

We study exact and approximate methods for maximum inner product search, a fundamental problem in a number of data mining and information retrieval tasks. We propose the LEMP framework, which supports both exact and approximate search with quality guarantees. At its heart, LEMP transforms a maximum inner product search problem over a large database of vectors into a number of smaller cosine similarity search problems. This transformation allows LEMP to prune large parts of the search space immediately and to select suitable search algorithms for each of the remaining problems individually. LEMP is able to leverage existing methods for cosine similarity search, but we also provide a number of novel search algorithms tailored to our setting. We conducted an extensive experimental study that provides insight into the performance of many state-of-the-art techniques—including LEMP—on multiple real-world datasets. We found that LEMP often was significantly faster or more accurate than alternative methods.

[1]  James Bennett,et al.  The Netflix Prize , 2007 .

[2]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[3]  Sang-goo Lee,et al.  An Efficient Similarity Join Algorithm with Cosine Similarity Predicate , 2010, DEXA.

[4]  Yehuda Koren,et al.  The Yahoo! Music Dataset and KDD-Cup '11 , 2012, KDD Cup.

[5]  Srinivasan Parthasarathy,et al.  Bayesian Locality Sensitive Hashing for Fast Similarity Search , 2011, Proc. VLDB Endow..

[6]  Roberto J. Bayardo,et al.  Scaling up all pairs similarity search , 2007, WWW '07.

[7]  Gerhard Weikum,et al.  ACM Transactions on Database Systems , 2005 .

[8]  Andrew McCallum,et al.  Relation Extraction with Matrix Factorization and Universal Schemas , 2013, NAACL.

[9]  Ulrich Paquet,et al.  Speeding up the Xbox recommender system using a euclidean transformation for inner-product spaces , 2014, RecSys '14.

[10]  Vipin Kumar UNDERSTANDING COMPLEX DATASETS: DATA MINING WITH MATRIX DECOMPOSITIONS , 2006 .

[11]  Alexandr Andoni,et al.  Practical and Optimal LSH for Angular Distance , 2015, NIPS.

[12]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[13]  Rasmus Pagh,et al.  On the Complexity of Inner Product Similarity Join , 2015, PODS.

[14]  Parikshit Ram,et al.  Dual‐tree fast exact max‐kernel search , 2014, Stat. Anal. Data Min..

[15]  Peter J. Haas,et al.  Shared-memory and shared-nothing stochastic gradient descent algorithms for matrix completion , 2013, Knowledge and Information Systems.

[16]  Luo Si,et al.  Preference preserving hashing for efficient recommendation , 2014, SIGIR.

[17]  Jonathon Shlens,et al.  Fast, Accurate Detection of 100,000 Object Classes on a Single Machine , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[19]  Gerhard Weikum,et al.  PATTY: A Taxonomy of Relational Patterns with Semantic Types , 2012, EMNLP.

[20]  Parikshit Ram,et al.  Fast Exact Max-Kernel Search , 2012, SDM.

[21]  TeflioudiChristina,et al.  Exact and Approximate Maximum Inner Product Search with LEMP , 2016 .

[22]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[23]  Ping Li,et al.  An Improved Scheme for Asymmetric LSH , 2014, ArXiv.

[24]  Ping Li,et al.  Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) , 2014, NIPS.

[25]  Christopher Ré,et al.  Parallel stochastic gradient algorithms for large-scale matrix completion , 2013, Mathematical Programming Computation.

[26]  George Karypis,et al.  L2AP: Fast cosine similarity search with prefix L-2 norm bounds , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[27]  Jeffrey Xu Yu,et al.  Efficient similarity joins for near-duplicate detection , 2011, TODS.

[28]  Rainer Gemulla,et al.  LEMP: Fast Retrieval of Large Entries in a Matrix Product , 2015, SIGMOD Conference.

[29]  Parikshit Ram,et al.  Efficient retrieval of recommendations in a matrix factorization framework , 2012, CIKM.

[30]  Surajit Chaudhuri,et al.  A Primitive Operator for Similarity Joins in Data Cleaning , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[31]  Rainer Gemulla,et al.  Distributed Matrix Completion , 2012, 2012 IEEE 12th International Conference on Data Mining.

[32]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[33]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[34]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[35]  Yehuda Koren,et al.  Yahoo! music recommendations: modeling music ratings with temporal dynamics and item taxonomy , 2011, RecSys '11.

[36]  Nathan Srebro,et al.  On Symmetric and Asymmetric LSHs for Inner Product Search , 2014, ICML.

[37]  Parikshit Ram,et al.  Maximum inner-product search using cone trees , 2012, KDD.