Indexable Probabilistic Matrix Factorization for Maximum Inner Product Search

The Maximum Inner Product Search (MIPS) problem, prevalent in matrix factorization-based recommender systems, scales linearly with the number of objects to score. Recent work has shown that clever post-processing steps can turn the MIPS problem into a nearest neighbour one, allowing sublinear retrieval time either through Locality Sensitive Hashing or various tree structures that partition the Euclidian space. This work shows that instead of employing post-processing steps, substantially faster retrieval times can be achieved for the same accuracy when inference is not decoupled from the indexing process. By framing matrix factorization to be natively indexable, so that any solution is immediately sublinearly searchable, we use the machinery of Machine Learning to best learn such a solution. We introduce Indexable Probabilistic Matrix Factorization (IPMF) to shift the traditional post-processing complexity into the training phase of the model. Its inference procedure is based on Geodesic Monte Carlo, and adds minimal additional computational cost to standard Monte Carlo methods for matrix factorization. By coupling inference and indexing in this way, we achieve more than a 50% improvement in retrieval time against two state of the art methods, for a given level of accuracy in the recommendations of two large-scale recommender systems.

[1]  Patrick Seemann,et al.  Matrix Factorization Techniques for Recommender Systems , 2014 .

[2]  Ping Li,et al.  Improved Asymmetric Locality Sensitive Hashing (ALSH) for Maximum Inner Product Search (MIPS) , 2014, UAI.

[3]  Ole Winther,et al.  A hierarchical model for ordinal matrix factorization , 2012, Stat. Comput..

[4]  Nathan Srebro,et al.  On Symmetric and Asymmetric LSHs for Inner Product Search , 2014, ICML.

[5]  Michael I. Jordan,et al.  Mixed Membership Matrix Factorization , 2010, ICML.

[6]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[7]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[8]  Ulrich Paquet,et al.  Speeding up the Xbox recommender system using a euclidean transformation for inner-product spaces , 2014, RecSys '14.

[9]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[10]  Neil D. Lawrence,et al.  Non-linear matrix factorization with Gaussian processes , 2009, ICML '09.

[11]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[12]  William Nick Street,et al.  Collaborative filtering via euclidean embedding , 2010, RecSys '10.

[13]  James Bennett,et al.  The Netflix Prize , 2007 .

[14]  Yehuda Koren,et al.  The Yahoo! Music Dataset and KDD-Cup '11 , 2012, KDD Cup.

[15]  Pietro Perona,et al.  Indexing in large scale image collections: Scaling properties and benchmark , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[16]  Ping Li,et al.  Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) , 2014, NIPS.

[17]  Max Welling,et al.  Bayesian Matrix Factorization with Side Information and Dirichlet Process Mixtures , 2010, AAAI.

[18]  A. Wood Simulation of the von mises fisher distribution , 1994 .

[19]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[20]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Ulrich Paquet,et al.  One-class collaborative filtering with random graphs , 2013, WWW.

[22]  I. Dhillon,et al.  Modeling Data using Directional Distributions , 2003 .

[23]  Parikshit Ram,et al.  Maximum inner-product search using cone trees , 2012, KDD.

[24]  M. Girolami,et al.  Geodesic Monte Carlo on Embedded Manifolds , 2013, Scandinavian journal of statistics, theory and applications.

[25]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[26]  Yehuda Koren,et al.  The BellKor Solution to the Netflix Grand Prize , 2009 .

[27]  Ruslan Salakhutdinov,et al.  Bayesian probabilistic matrix factorization using Markov chain Monte Carlo , 2008, ICML '08.

[28]  Parikshit Ram,et al.  Efficient retrieval of recommendations in a matrix factorization framework , 2012, CIKM.