Statistical Methods for Recommender Systems

Designing algorithms to recommend items such as news articles and movies to users is a challenging task in numerous web applications. The crux of the problem is to rank items based on users' responses to different items to optimize for multiple objectives. Major technical challenges are high dimensional prediction with sparse data and constructing high dimensional sequential designs to collect data for user modeling and system design. This comprehensive treatment of the statistical issues that arise in recommender systems includes detailed, in-depth discussions of current state-of-the-art methods such as adaptive sequential designs (multi-armed bandit methods), bilinear random-effects models (matrix factorization) and scalable model fitting using modern computing paradigms like MapReduce. The authors draw upon their vast experience working with such large-scale systems at Yahoo! and LinkedIn, and bridge the gap between theory and practice by illustrating complex concepts with examples from applications they are directly involved with.

[1]  Zhouyu Fu,et al.  A Survey of Audio-Based Music Classification and Annotation , 2011, IEEE Transactions on Multimedia.

[2]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[3]  Mark Claypool,et al.  Combining Content-Based and Collaborative Filters in an Online Newspaper , 1999, SIGIR 1999.

[4]  Alexander J. Smola,et al.  An architecture for parallel topic models , 2010, Proc. VLDB Endow..

[5]  Peter E. Rossi,et al.  Bayesian Statistics and Marketing , 2005 .

[6]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[7]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[8]  J. Sarkar One-Armed Bandit Problems with Covariates , 1991 .

[9]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[10]  T. W. Anderson Estimating Linear Restrictions on Regression Coefficients for Multivariate Normal Distributions , 1951 .

[11]  Jean Walrand,et al.  Extensions of the multiarmed bandit problem: The discounted case , 1985 .

[12]  George Karypis,et al.  A Comprehensive Survey of Neighborhood-based Recommendation Methods , 2011, Recommender Systems Handbook.

[13]  P. McCullagh,et al.  Generalized Linear Models , 1972, Predictive Analytics.

[14]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[15]  Ben Taskar,et al.  Introduction to statistical relational learning , 2007 .

[16]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[17]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[18]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[19]  Chung Keung Poon,et al.  Algorithmic Aspects in Information and Management, 6th International Conference, AAIM 2010, Weihai, China, July 19-21, 2010. Proceedings , 2010, Algorithmic Applications in Management.

[20]  W. Gilks,et al.  Adaptive Rejection Metropolis Sampling Within Gibbs Sampling , 1995 .

[21]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[22]  Lior Rokach,et al.  Recommender Systems Handbook , 2010 .

[23]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[24]  Michael N. Katehakis,et al.  The Multi-Armed Bandit Problem: Decomposition and Computation , 1987, Math. Oper. Res..

[25]  H. Robbins,et al.  Asymptotically efficient adaptive allocation rules , 1985 .

[26]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[27]  Russell Bent,et al.  Online stochastic combinatorial optimization , 2006 .

[28]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[29]  Gene H. Golub,et al.  Matrix computations , 1983 .

[30]  Adriano Veloso,et al.  Multi-Objective Pareto-Efficient Approaches for Recommend er Systems , 2013 .

[31]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[32]  José Niòo-Mora A (2/3)n3 Fast-Pivoting Algorithm for the Gittins Index and Optimal Stopping of a Markov Chain , 2007 .

[33]  Jeff Harrison,et al.  Applied Bayesian Forecasting and Time Series Analysis , 1994 .

[34]  J. Booth,et al.  Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm , 1999 .

[35]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[36]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[37]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[38]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[39]  Deepak Agarwal,et al.  Content recommendation on web portals , 2013, CACM.

[40]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[42]  Mehryar Mohri,et al.  Multi-armed Bandit Algorithms and Empirical Evaluation , 2005, ECML.

[43]  P. McCullagh Regression Models for Ordinal Data , 1980 .

[44]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .

[45]  P. Whittle Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.

[46]  Daniel C. Fain,et al.  Sponsored search: A brief history , 2006 .

[47]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[48]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[49]  Gerhard Friedrich,et al.  Recommender Systems - An Introduction , 2010 .

[50]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[51]  M. West,et al.  Bayesian forecasting and dynamic models , 1989 .

[52]  Yoav Shoham,et al.  Fab: content-based, collaborative recommendation , 1997, CACM.

[53]  Max Welling,et al.  Multi-HDP: A Non Parametric Bayesian Model for Tensor Factorization , 2008, AAAI.

[54]  Christian Breiteneder,et al.  Features for Content-Based Audio Retrieval , 2010, Adv. Comput..

[55]  R. R. Lumley,et al.  On the optimal allocation of service to impatient tasks , 2004, Journal of Applied Probability.

[56]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[57]  Hermann Ney,et al.  Features for image retrieval: an experimental comparison , 2008, Information Retrieval.

[58]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[59]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[60]  M. Degroot Optimal Statistical Decisions , 1970 .

[61]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .