Generating inter-dependent data streams for recommender systems

Abstract Recommender systems are essential tools in modern e-commerce, streaming services, search engines, social networks and many other areas including the scientific community. However, lack of publicly available data hinders the development and evaluation of recommender algorithms. To address this problem, we propose a Generator of Inter-dependent Data Streams (GIDS), capable of generating multiple temporal and inter-dependent synthetic datasets of relational data. The generator is able to simulate a collection of time-changing data streams, helping to effectively evaluate a variety of recommender systems, data fusion algorithms and incremental algorithms. The evaluation using recommender and data fusion algorithms showed that our generator can successfully mimic real datasets in terms of statistical data properties, and achieved performance of recommender systems.

[1]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[2]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[3]  Sergio Ilarri,et al.  DataGenCARS: A generator of synthetic data for the evaluation of context-aware recommendation systems , 2017, Pervasive Mob. Comput..

[4]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[5]  Lars Schmidt-Thieme,et al.  Empirical Analysis of Attribute-Aware Recommendation Algorithms with Variable Synthetic Data , 2006, Data Science and Classification.

[6]  MengChu Zhou,et al.  An Efficient Non-Negative Matrix-Factorization-Based Approach to Collaborative Filtering for Recommender Systems , 2014, IEEE Transactions on Industrial Informatics.

[7]  Lars Schmidt-Thieme,et al.  Empirical Analysis of Attribute-Aware Recommender System Algorithms Using Synthetic Data , 2006, J. Comput..

[8]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[9]  Vinko Zlatic,et al.  Synthetic Sequence Generator for Recommender Systems - Memory Biased Random Walk on a Sequence Multilayer Network , 2012, Discovery Science.

[10]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[11]  Richard Granger,et al.  Incremental Learning from Noisy Data , 1986, Machine Learning.

[12]  Tsvi Kuflik,et al.  Cross-Domain Mediation in Collaborative Filtering , 2007, User Modeling.

[13]  A. Shamshad,et al.  First and second order Markov chain models for synthetic generation of wind speed time series , 2005 .

[14]  Kenneth Baclawski,et al.  Quickly generating billion-record synthetic databases , 1994, SIGMOD '94.

[15]  Kathleen Marchal,et al.  SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms , 2006, BMC Bioinformatics.

[16]  Yaling Pei,et al.  A Synthetic Data Generator for Clustering and Outlier Analysis , 2006 .

[17]  Xue Li,et al.  Time weight collaborative filtering , 2005, CIKM '05.

[18]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[19]  Marinka Zitnik,et al.  Data Fusion by Matrix Factorization , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Gediminas Adomavicius,et al.  Context-aware recommender systems , 2008, RecSys '08.

[21]  Mani B. Srivastava,et al.  SenseGen: A deep learning architecture for synthetic sensor data generation , 2017, 2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops).

[22]  Lars Schmidt-Thieme,et al.  Online-updating regularized kernel matrix factorization models for large-scale recommender systems , 2008, RecSys '08.

[23]  Marie-Aude Aufaure,et al.  Generating Synthetic Data for Context-Aware Recommender Systems , 2013, 2013 BRICS Congress on Computational Intelligence and 11th Brazilian Congress on Computational Intelligence.

[24]  Marko Robnik-Sikonja Data Generators for Learning Systems Based on RBF Networks , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Paul D. Scott,et al.  Evaluating data mining procedures: techniques for generating artificial data sets , 1999, Inf. Softw. Technol..

[26]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.