Scalable Bayesian Modelling of Paired Symbols

We present a novel, scalable and Bayesian approach to modelling the occurrence of pairs of symbols (i,j) drawn from a large vocabulary. Observed pairs are assumed to be generated by a simple popularity based selection process followed by censoring using a preference function. By basing inference on the well-founded principle of variational bounding, and using new site-independent bounds, we show how a scalable inference procedure can be obtained for large data sets. State of the art results are presented on real-world movie viewing data.

[1]  Ole Winther,et al.  A hierarchical model for ordinal matrix factorization , 2012, Stat. Comput..

[2]  Ulrich Paquet,et al.  One-class collaborative filtering with random graphs , 2013, WWW.

[3]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[4]  Michael I. Jordan,et al.  A Variational Approach to Bayesian Logistic Regression Models and their Extensions , 1997, AISTATS.

[5]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[6]  Ulrich Paquet,et al.  One-class Collaborative Filtering with Random Graphs: Annotated Version , 2013, ArXiv.

[7]  David J. C. MacKay,et al.  The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[8]  Ruslan Salakhutdinov,et al.  Bayesian probabilistic matrix factorization using Markov chain Monte Carlo , 2008, ICML '08.

[9]  Yoshua Bengio,et al.  Quick Training of Probabilistic Neural Nets by Importance Sampling , 2003, AISTATS.

[10]  D. Böhning Multinomial logistic regression algorithm , 1992 .

[11]  Yee Whye Teh,et al.  A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[12]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[13]  Blockin Blockin,et al.  Quick Training of Probabilistic Neural Nets by Importance Sampling , 2003 .

[14]  Guillaume Bouchard Efficient Bounds for the Softmax Function and Applications to Approximate Inference in Hybrid models , 2008 .

[15]  Yee Whye Teh,et al.  Learning Label Trees for Probabilistic Modelling of Implicit Feedback , 2012, NIPS.

[16]  Steve R. Waterhouse,et al.  Bayesian Methods for Mixtures of Experts , 1995, NIPS.

[17]  Jianying Hu,et al.  One-Class Matrix Completion with Low-Density Factorizations , 2010, 2010 IEEE International Conference on Data Mining.

[18]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[19]  Aapo Hyvärinen,et al.  Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics , 2012, J. Mach. Learn. Res..

[20]  Ulrich Paquet,et al.  Xbox movies recommendations: variational bayes matrix factorization with embedded feature selection , 2013, RecSys.

[21]  D. Mackay Local Minima, Symmetry-breaking, and Model Pruning in Variational Free Energy Minimization , 2001 .

[22]  Richard S. Zemel,et al.  Collaborative prediction and ranking with non-random missing data , 2009, RecSys '09.

[23]  David M. Blei,et al.  Scalable Recommendation with Poisson Factorization , 2013, ArXiv.

[24]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[25]  Rong Pan,et al.  Mind the gaps: weighting the unknown in large-scale one-class collaborative filtering , 2009, KDD.