Novel Models and Ensemble Techniques to Discriminate Favorite Items from Unrated Ones for Personalized Music Recommendation

The Track 2 problem in KDD-Cup 2011 (music recommendation) is to discriminate between music tracks highly rated by a given user from those which are overall highly rated, but not rated by the given user. The training dataset consists of not only user rating history, but also the taxonomic information of track, artist, album, and genre. This paper describes the solution of the National Taiwan University team which ranked first place in the competition. We exploited a diverse of models (neighborhood models, latent models, Bayesian Personalized Ranking models, and random-walk models) with local blending and global ensemble to achieve 97.45% in accuracy on the testing dataset.

[1]  Yehuda Koren,et al.  The BellKor Solution to the Netflix Grand Prize , 2009 .

[2]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[3]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[4]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..

[5]  Yehuda Koren,et al.  The Yahoo! Music Dataset and KDD-Cup '11 , 2012, KDD Cup.

[6]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[7]  M. I. Jordan Leo Breiman , 2011, 1101.0929.

[8]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[9]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[10]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[11]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[12]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[13]  Hsuan-Tien Lin,et al.  One-sided Support Vector Regression for Multiclass Cost-sensitive Classification , 2010, ICML.

[14]  Pang-Ning Tan,et al.  Recommendation via Query Centered Random Walk on K-Partite Graph , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[15]  Ling Li,et al.  Optimizing 0/1 Loss for Perceptrons by Random Coordinate Descent , 2007, 2007 International Joint Conference on Neural Networks.

[16]  Qiang Yang,et al.  One-Class Collaborative Filtering , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[17]  Neil D. Lawrence,et al.  Non-linear matrix factorization with Gaussian processes , 2009, ICML '09.

[18]  Ma Chih-Chao Large-scale Collaborative Filtering Algorithms , 2008 .