Learning recommender systems with adaptive regularization

Many factorization models like matrix or tensor factorization have been proposed for the important application of recommender systems. The success of such factorization models depends largely on the choice of good values for the regularization parameters. Without a careful selection they result in poor prediction quality as they either underfit or overfit the data. Regularization values are typically determined by an expensive search that requires learning the model parameters several times: once for each tuple of candidate values for the regularization parameters. In this paper, we present a new method that adapts the regularization automatically while training the model parameters. To achieve this, we optimize simultaneously for two criteria: (1) as usual the model parameters for the regularized objective and (2) the regularization of future parameter updates for the best predictive quality on a validation set. We develop this for the generic model class of Factorization Machines which subsumes a wide variety of factorization models. We show empirically, that the advantages of our adaptive regularization method compared to expensive hyperparameter search do not come to the price of worse predictive quality. In total with our method, learning regularization parameters is as easy as learning model parameters and thus there is no need for any time-consuming search of regularization values because they are found on-the-fly. This makes our method highly attractive for practical use.

[1]  Lars Schmidt-Thieme,et al.  Pairwise interaction tensor factorization for personalized tag recommendation , 2010, WSDM '10.

[2]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[3]  Arkadiusz Paterek,et al.  Improving regularized singular value decomposition for collaborative filtering , 2007 .

[4]  Ruslan Salakhutdinov,et al.  Bayesian probabilistic matrix factorization using Markov chain Monte Carlo , 2008, ICML '08.

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  Xi Chen,et al.  Temporal Collaborative Filtering with Bayesian Probabilistic Tensor Factorization , 2010, SDM.

[7]  Lars Schmidt-Thieme,et al.  Learning optimal ranking with tensor factorization for tag recommendation , 2009, KDD.

[8]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[9]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[10]  Yehuda Koren,et al.  Collaborative filtering with temporal dynamics , 2009, KDD.

[11]  Lars Kai Hansen,et al.  Adaptive Regularization in Neural Network Modeling , 2012, Neural Networks: Tricks of the Trade.

[12]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[13]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[14]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[15]  Michael R. Lyu,et al.  SoRec: social recommendation using probabilistic matrix factorization , 2008, CIKM '08.

[16]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[17]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[18]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[19]  Thomas Hofmann,et al.  Latent semantic models for collaborative filtering , 2004, TOIS.

[20]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[21]  Nathan Srebro,et al.  Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.