DataGenCARS: A generator of synthetic data for the evaluation of context-aware recommendation systems

Abstract Context-Aware Recommender Systems (CARS) have started to attract significant research attention in the last years, due to the interest of considering the context of the user in order to offer him/her more appropriate recommendations. However, the evaluation of CARS is a challenge, due to the scarce availability of appropriate datasets that incorporate context information related to the ratings provided by the users. In this paper, we present DataGenCARS, a complete Java-based synthetic dataset generator that can be used to obtain the required datasets for any type of scenario desired, allowing a high flexibility in the obtention of appropriate data that can be used to evaluate CARS. The generator presents features such as: a flexible definition of user schemas, user profiles, types of items, and types of contexts; a realistic generation of ratings and attributes of items; the possibility to mix real and synthetic datasets; functionalities to analyze existing datasets as a basis for synthetic data generation; and support for the automatic mapping between item schemas and Java classes. Moreover, an experimental evaluation illustrates the interest and the benefits provided by DataGenCARS.

[1]  Gediminas Adomavicius,et al.  Incorporating contextual information in recommender systems using a multidimensional approach , 2005, TOIS.

[2]  Marko Tkalcic,et al.  Database for contextual personalization , 2011 .

[3]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[4]  Francesco Ricci,et al.  Context-Aware Points of Interest Suggestion with Dynamic Weather Data Management , 2014, ENTER.

[5]  Francesco Ricci,et al.  Personality-Based Active Learning for Collaborative Filtering Recommender Systems , 2013, AI*IA.

[6]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[7]  Michael A. Saunders,et al.  LSMR: An Iterative Algorithm for Sparse Least-Squares Problems , 2011, SIAM J. Sci. Comput..

[8]  Peter Christen,et al.  GeCo: an online personal data generator and corruptor , 2013, CIKM.

[9]  Marie-Aude Aufaure,et al.  Generating Synthetic Data for Context-Aware Recommender Systems , 2013, 2013 BRICS Congress on Computational Intelligence and 11th Brazilian Congress on Computational Intelligence.

[10]  Lars Schmidt-Thieme,et al.  Empirical Analysis of Attribute-Aware Recommender System Algorithms Using Synthetic Data , 2006, J. Comput..

[11]  Francesco Ricci,et al.  Context-Aware Recommender Systems , 2011, AI Mag..

[12]  Sergio Ilarri,et al.  Pull-based recommendations in mobile environments , 2016, Comput. Stand. Interfaces.

[13]  C. Walck Hand-book on statistical distributions for experimentalists , 1996 .

[14]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[15]  Jurij F. Tasic,et al.  Predicting and Detecting the Relevant Contextual Information in a Movie-Recommender System , 2013, Interact. Comput..

[16]  Gediminas Adomavicius,et al.  Context-aware recommender systems , 2008, RecSys '08.

[17]  Evangelia Christakopoulou,et al.  HOSLIM: Higher-Order Sparse LInear Method for Top-N Recommender Systems , 2014, PAKDD.

[18]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[19]  Michael J. Pazzani,et al.  Learning Collaborative Information Filters , 1998, ICML.

[20]  Thomas Cerqueus,et al.  Synthetic Data Generation using Benerator Tool , 2013, ArXiv.

[21]  Rico Wind,et al.  Simple and realistic data generation , 2006, VLDB.

[22]  Guy Shani,et al.  Evaluating Recommendation Systems , 2011, Recommender Systems Handbook.

[23]  Joachim H. Ahrens,et al.  Computer methods for sampling from the exponential and normal distributions , 1972, CACM.

[24]  Haesung Lee,et al.  Personalized TV Contents Recommender System Using Collaborative Context tagging-based User's Preference Prediction Technique , 2014, MUE 2014.

[25]  Harry Zhang,et al.  Exploring Conditions For The Optimality Of Naïve Bayes , 2005, Int. J. Pattern Recognit. Artif. Intell..

[26]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[27]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[28]  Jun Hu,et al.  Improving the diversity of user-based Top-N recommendation by Cloud Model , 2010, 2010 5th International Conference on Computer Science & Education.

[29]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[30]  Lars Schmidt-Thieme,et al.  Empirical Analysis of Attribute-Aware Recommendation Algorithms with Variable Synthetic Data , 2006, Data Science and Classification.

[31]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[32]  Werner Dubitzky,et al.  Fundamentals of Data Mining in Genomics and Proteomics , 2009 .

[33]  Marco Evangelos Biancolini,et al.  Synthetic dataset generation for the analysis and the evaluation of image-based hemodynamics of the human aorta , 2011, Medical & Biological Engineering & Computing.

[34]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[35]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..