A Time-Aware Exploration of RecSys15 Challenge Dataset

E-commerce is currently one of the main applications of recommender systems, since it generates vast amounts of data that can be used to make predictions and analyse users's behaviour. In this paper we present an overview of the public dataset used for the RecSys Challenge 2015. We describe the basic statistical properties of this dataset and how events (clicks and purchases) are distributed over products (items) and users (sessions). We also present a time-aware analysis of the dataset, with the aim to better understand the change of user behaviour within time cycles, and how it affects the activity in user purchases. We further study the relation between categories, the solely type of metadata present in this completely anonymised dataset. We are interested both in how these categories are distributed and how users and items interact with them. Finally, along the paper we explain the implications that the results obtained from our analysis may have when building models for the challenge.

[1]  Lior Rokach,et al.  RecSys Challenge 2015 and the YOOCHOOSE Dataset , 2015, RecSys.

[2]  Peter Romov,et al.  RecSys Challenge 2015: ensemble learning with categorical features , 2015, RecSys Challenge.

[3]  Yehuda Koren,et al.  Collaborative filtering with temporal dynamics , 2009, KDD.

[4]  J. Avery,et al.  The long tail. , 1995, Journal of the Tennessee Medical Association.

[5]  Minsuk Kahng,et al.  Temporal Dynamics in Music Listening Behavior: A Case Study of Online Music Service , 2010, 2010 IEEE/ACIS 9th International Conference on Computer and Information Science.

[6]  A. M. Madni,et al.  Recommender systems in e-commerce , 2014, 2014 World Automation Congress (WAC).

[7]  Ulrich Paquet,et al.  The Xbox recommender system , 2012, RecSys.

[8]  Òscar Celma,et al.  Music recommendation and discovery in the long tail , 2008 .

[9]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .