Combining Spotify and Twitter Data for Generating a Recent and Public Dataset for Music Recommendation

In this paper, we present a dataset based on publicly available information. It contains listening histories of Spotify users, who posted what they are listening at the moment on the micro blogging platform Twitter. The dataset was derived using the Twitter Streaming API and is updated regularly. To show an application of this dataset, we implement and evaluate a pure collaborative ltering based recommender system. The performance of this system can be seen as a baseline approach for evaluating further, more sophisticated recommendation approaches. These approaches will be implemented and benchmarked against our baseline approach in future works.

[1]  Markus Schedl,et al.  Hybrid retrieval approaches to geospatial music recommendation , 2013, SIGIR.

[2]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[3]  Eva Zangerle,et al.  Exploiting Twitter's Collective Knowledge for Music Recommendations , 2012, #MSM.

[4]  Alexandre Passant,et al.  dbrec - Music Recommendations Using DBpedia , 2010, SEMWEB.

[5]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[6]  Toon De Pessemier,et al.  MovieTweetings: a movie rating dataset collected from twitter , 2013, RecSys 2013.

[7]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[8]  Yehuda Koren,et al.  Build your own music recommender by modeling internet radio streams , 2012, WWW.

[9]  Markus Schedl,et al.  The Million Musical Tweet Dataset - What We Can Learn From Microblogs , 2013, ISMIR.

[10]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[11]  Peter Knees,et al.  Investigating Web-Based Approaches to Revealing Prototypical Music Artists in Genre Taxonomies , 2007, 2006 1st International Conference on Digital Information Management.

[12]  Markus Schedl Leveraging Microblogs for Spatiotemporal Music Information Retrieval , 2013, ECIR.

[13]  Òscar Celma,et al.  Music Recommendation and Discovery - The Long Tail, Long Fail, and Long Play in the Digital Music Space , 2010 .

[14]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[15]  Nicola Orio,et al.  A professionally annotated and enriched multimodal data set on popular music , 2013, MMSys.

[16]  Marti A. Hearst Chapter 2 of the second edition of Modern Information Retrieval Renamed Modern Information Retrieval : The Concepts and Technology behind Search , 2011 .