The plista dataset

Releasing datasets has fostered research in fields such as information retrieval and recommender systems. Datasets are typically tailored for specific scenarios. In this work, we present the plista dataset. The dataset contains a collection of news articles published on 13 news portals. Additionally, the dataset comprises user interactions with those articles. We inctroduce the dataset's main characteristics. Further, we illustrate possible applications of the dataset.

[1]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[2]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[3]  Cyril W. Cleverdon The effect of variations in relevance assessments in comparative experimental tests of index languages , 1970 .

[4]  Kevin C. Almeroth,et al.  Workshop and challenge on news recommender systems , 2013, RecSys.

[5]  Frank Hopfgartner,et al.  Semantic user profiling techniques for personalised multimedia recommendation , 2010, Multimedia Systems.

[6]  Frank Hopfgartner,et al.  Use of Implicit Graph for Recommending Relevant Videos: A Simulated Evaluation , 2008, ECIR.

[7]  Leif Azzopardi,et al.  A Methodology for Building a Patent Test Collection for Prior Art Search , 2008, EVIA@NTCIR.

[8]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[9]  Cyril W. Cleverdon,et al.  Factors determining the performance of indexing systems , 1966 .

[10]  Paolo Cremonesi,et al.  Cross-Domain Recommender Systems , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[11]  Pia Borlund,et al.  The concept of relevance in IR , 2003, J. Assoc. Inf. Sci. Technol..

[12]  Torben Brodt,et al.  The Search for the Best Live Recommender System , 2013 .

[13]  Craig MacDonald,et al.  Blog track research at TREC , 2010, SIGF.

[14]  Jimmy J. Lin,et al.  A month in the life of a production news recommender system , 2013, LivingLab '13.

[15]  Donna K. Harman,et al.  The text REtrieval conference (TREC): history and plans for TREC-9 , 1999, SIGF.

[16]  Alexander Tuzhilin,et al.  The long tail of recommender systems and how to leverage it , 2008, RecSys '08.

[17]  James Bennett,et al.  The Netflix Prize , 2007 .

[18]  Yehuda Koren,et al.  The Yahoo! Music Dataset and KDD-Cup '11 , 2012, KDD Cup.

[19]  Kenneth Y. Goldberg,et al.  Eigentaste: A Constant Time Collaborative Filtering Algorithm , 2001, Information Retrieval.

[20]  Abhinandan Das,et al.  Google news personalization: scalable online collaborative filtering , 2007, WWW '07.

[21]  Jiahui Liu,et al.  Personalized news recommendation based on click behavior , 2010, IUI '10.

[22]  Ellen M. Voorhees,et al.  TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing) , 2005 .

[23]  Mark Sanderson,et al.  Evaluating the performance of information retrieval systems using test collections , 2013, Inf. Res..

[24]  Cyril W. Cleverdon,et al.  Aslib Cranfield research project - Factors determining the performance of indexing systems; Volume 1, Design; Part 2, Appendices , 1966 .