The Yahoo! Music Dataset and KDD-Cup '11

KDD-Cup 2011 challenged the community to identify user tastes in music by leveraging Yahoo! Music user ratings. The competition hosted two tracks, which were based on two datasets sampled from the raw data, including hundreds of millions of ratings. The underlying ratings were given to four types of musical items: tracks, albums, artists, and genres, forming a four level hierarchical taxonomy. The challenge started on March 15, 2011 and ended on June 30, 2011 attracting 2389 participants, 2100 of which were active by the end of the competition. The popularity of the challenge is related to the fact that learning a large scale recommender systems is a generic problem, highly relevant to the industry. In addition, the contest drew interest by introducing a number of scientific and technical challenges including dataset size, hierarchical structure of items, high resolution timestamps of ratings, and a non-conventional ranking-based task. This paper provides the organizers' account of the contest, including: a detailed analysis of the datasets, discussion of the contest goals and actual conduct, and lessons learned throughout the contest.

[1]  Padhraic Smyth,et al.  KDD Cup and workshop 2007 , 2007, SKDD.

[2]  James Bennett,et al.  The Netflix Prize , 2007 .

[3]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[4]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[5]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[6]  Roberto Turrin,et al.  Performance of recommender algorithms on top-n recommendation tasks , 2010, RecSys '10.

[7]  Harald Steck,et al.  Training and testing of recommender systems on data missing not at random , 2010, KDD.

[8]  Yehuda Koren,et al.  Yahoo! music recommendations: modeling music ratings with temporal dynamics and item taxonomy , 2011, RecSys '11.

[9]  Weinan Zhang,et al.  Informative Ensemble of Multi-Resolution Dynamic Factorization Models , 2011 .

[10]  Shou-De Lin,et al.  A Linear Ensemble of Individual and Blended Models for Music Rating Prediction , 2012, KDD Cup.

[11]  Yifan Hu,et al.  Combining Predictors for Recommending Music: the False Positives' approach to KDD Cup track 2 , 2012, KDD Cup.

[12]  Hang Li,et al.  Hybrid Recommendation Models for Binary User Preference Prediction Problem , 2012, KDD Cup.

[13]  Michael Jahrer,et al.  Collaborative Filtering Ensemble , 2012, KDD Cup.

[14]  Andriy Mnih,et al.  Taxonomy-Informed Latent Factor Models for Implicit Feedback , 2012, KDD Cup.

[15]  Shou-De Lin,et al.  Novel Models and Ensemble Techniques to Discriminate Favorite Items from Unrated Ones for Personalized Music Recommendation , 2012, KDD Cup.