All You Need is Ratings: A Clustering Approach to Synthetic Rating Datasets Generation

The public availability of collections containing user preferences is of vital importance for performing offline evaluations in the field of recommender systems. However, the number of rating datasets is limited because of the costs required for their creation and the fear of violating the privacy of the users by sharing them. For this reason, numerous research attempts investigated the creation of synthetic collections of ratings using generative approaches. Nevertheless, these datasets are usually not reliable enough for conducting an evaluation campaign. In this paper, we propose a method for creating synthetic datasets with a configurable number of users that mimic the characteristics of already existing ones. We empirically validated the proposed approach by exploiting the synthetic datasets for evaluating different recommenders and by comparing the results with the ones obtained using real datasets.

[1]  Inderjit S. Dhillon,et al.  Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems , 2012, 2012 IEEE 12th International Conference on Data Mining.

[2]  Tsvi Kuflik,et al.  Workshop on information heterogeneity and fusion in recommender systems (HetRec 2010) , 2010, RecSys '10.

[3]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[4]  Marie-Aude Aufaure,et al.  Generating Synthetic Data for Context-Aware Recommender Systems , 2013, 2013 BRICS Congress on Computational Intelligence and 11th Brazilian Congress on Computational Intelligence.

[5]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[6]  Tsvi Kuflik,et al.  Second workshop on information heterogeneity and fusion in recommender systems (HetRec2011) , 2011, RecSys '11.

[7]  Alan Said,et al.  Comparative recommender system evaluation: benchmarking recommendation frameworks , 2014, RecSys '14.

[8]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[9]  Maurizio Morisio,et al.  A Distributed and Accountable Approach to Offline Recommender Systems Evaluation , 2018, RecSys 2018.

[10]  Josep Lluís de la Rosa i Esteva,et al.  Evaluation of Recommender Systems Through Simulated Users , 2004, ICEIS.

[11]  Sergio Ilarri,et al.  DataGenCARS: A generator of synthetic data for the evaluation of context-aware recommendation systems , 2017, Pervasive Mob. Comput..

[12]  Lars Schmidt-Thieme,et al.  Empirical Analysis of Attribute-Aware Recommendation Algorithms with Variable Synthetic Data , 2006, Data Science and Classification.

[13]  Guy Shani,et al.  Evaluating Recommender Systems , 2015, Recommender Systems Handbook.

[14]  Bart P. Knijnenburg,et al.  Evaluating Recommender Systems with User Experiments , 2015, Recommender Systems Handbook.

[15]  Rico Wind,et al.  Simple and realistic data generation , 2006, VLDB.

[16]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[17]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[18]  Nikos Manouselis,et al.  Preliminary Study of the Expected Performance of MAUT Collaborative Filtering Algorithms , 2008, WSKS.