ContentWise Impressions: An Industrial Dataset with Impressions Included

In this article, we introduce the \dataset dataset, a collection of implicit interactions and impressions of movies and TV series from an Over-The-Top media service, which delivers its media contents over the Internet. The dataset is distinguished from other already available multimedia recommendation datasets by the availability of impressions, \idest the recommendations shown to the user, its size, and by being open-source. We describe the data collection process, the preprocessing applied, its characteristics, and statistics when compared to other commonly used datasets. We also highlight several possible use cases and research questions that can benefit from the availability of user impressions in an open-source dataset. Furthermore, we release software tools to load and split the data, as well as examples of how to use both user interactions and impressions in several common recommendation algorithms.

[1]  Dietmar Jannach,et al.  A Troubling Analysis of Reproducibility and Progress in Recommender Systems Research , 2019, ACM Trans. Inf. Syst..

[2]  Roberto Turrin,et al.  Performance of recommender algorithms on top-n recommendation tasks , 2010, RecSys '10.

[3]  A. Tversky Features of Similarity , 1977 .

[4]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[5]  Maurizio Ferrari Dacrema,et al.  Artist-driven layering and user's behaviour impact on recommendations in a playlist continuation scenario , 2018, RecSys Challenge.

[6]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[7]  Fabian Abel,et al.  RecSys Challenge 2017: Offline and Online Evaluation , 2017, RecSys.

[8]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[9]  Fabio Aiolli,et al.  Efficient top-n recommendation for very large scale binary rated datasets , 2013, RecSys.

[10]  Dietmar Jannach,et al.  Are we really making much progress? A worrying analysis of recent neural recommendation approaches , 2019, RecSys.

[11]  T. Sørensen,et al.  A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on Danish commons , 1948 .

[12]  Mirko Polato,et al.  A preliminary study on a recommender system for the job recommendation challenge , 2016, RecSys Challenge '16.

[13]  Wentao Wang,et al.  Click-through Rate Estimates based on Deep Learning , 2018, ICDLT '18.

[14]  Heng-Tze Cheng,et al.  Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.

[15]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[16]  Laks V. S. Lakshmanan,et al.  Modeling impression discounting in large-scale recommender systems , 2014, KDD.

[17]  Peter Knees,et al.  RecSys challenge 2019: session-based hotel recommendations , 2019, RecSys.

[18]  Martha Larson,et al.  RecSys Challenge 2016: Job Recommendations , 2016, RecSys.

[19]  Abraham Bernstein,et al.  Updatable, Accurate, Diverse, and Scalable Recommendations for Interactive Applications , 2016, ACM Trans. Interact. Intell. Syst..