A Troubling Analysis of Reproducibility and Progress in Recommender Systems Research

The design of algorithms that generate personalized ranked item lists is a central topic of research in the field of recommender systems. In the past few years, in particular, approaches based on deep learning (neural) techniques have become dominant in the literature. For all of them, substantial progress over the state-of-the-art is claimed. However, indications exist of certain problems in today's research practice, e.g., with respect to the choice and optimization of the baselines used for comparison, raising questions about the published claims. In order to obtain a better understanding of the actual progress, we have tried to reproduce recent results in the area of neural recommendation approaches based on collaborative filtering. The worrying outcome of the analysis of these recent works-all were published at prestigious scientific conferences between 2015 and 2018-is that 11 out of the 12 reproducible neural approaches can be outperformed by conceptually simple methods, e.g., based on the nearest-neighbor heuristics. None of the computationally complex neural methods was actually consistently better than already existing learning-based techniques, e.g., using matrix factorization or linear models. In our analysis, we discuss common issues in today's research practice, which, despite the many papers that are published on the topic, have apparently led the field to a certain level of stagnation.

[1]  Joo-Hwee Lim,et al.  Similarity Learning for Nearest Neighbor Classification , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[2]  George Karypis,et al.  SLIM: Sparse Linear Methods for Top-N Recommender Systems , 2011, 2011 IEEE 11th International Conference on Data Mining.

[3]  Pasquale Lops,et al.  Content-based Recommender Systems: State of the Art and Trends , 2011, Recommender Systems Handbook.

[4]  Noemi Mauro,et al.  Performance comparison of neural and non-neural approaches to session-based recommendation , 2019, RecSys.

[5]  Xin Jin,et al.  Semantically Enhanced Collaborative Filtering on the Web , 2003, EWMF.

[6]  Xiaoyu Du,et al.  Outer Product-based Neural Collaborative Filtering , 2018, IJCAI.

[7]  Lina Yao,et al.  NeuRec: On Nonlinear Transformation for Personalized Ranking , 2018, IJCAI.

[8]  Matthew W. Hoffman,et al.  Predictive Entropy Search for Efficient Global Optimization of Black-box Functions , 2014, NIPS.

[9]  Atsuhiro Takasu,et al.  NPE: Neural Personalized Embedding for Collaborative Filtering , 2018, IJCAI.

[10]  Kiri Wagstaff,et al.  Machine Learning that Matters , 2012, ICML.

[11]  Bin Shen,et al.  Collaborative Memory Network for Recommendation Systems , 2018, SIGIR.

[12]  Boi Faltings,et al.  Predicting Online Performance of News Recommender Systems Through Richer Evaluation Metrics , 2015, RecSys.

[13]  Donghyun Kim,et al.  Convolutional Matrix Factorization for Document Context-Aware Recommendation , 2016, RecSys.

[14]  Victoria Stodden,et al.  The Scientific Method in Practice: Reproducibility in the Computational Sciences , 2010 .

[15]  Lei Zheng,et al.  Spectral collaborative filtering , 2018, RecSys.

[16]  Elena Smirnova,et al.  Meta-Prod2Vec: Product Embeddings Using Side-Information for Recommendation , 2016, RecSys.

[17]  Michael J. Pazzani,et al.  Learning Collaborative Information Filters , 1998, ICML.

[18]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[19]  Boi Faltings,et al.  Offline and online evaluation of news recommender systems at swissinfo.ch , 2014, RecSys '14.

[20]  Jöran Beel,et al.  A Comparison of Offline Evaluations, Online Evaluations, and User Studies in the Context of Research-Paper Recommender Systems , 2015, TPDL.

[21]  Liang He,et al.  Evaluating recommender systems , 2012, Seventh International Conference on Digital Information Management (ICDIM 2012).

[22]  Tu Minh Phuong,et al.  3D Convolutional Networks for Session-based Recommendation with Content Features , 2017, RecSys.

[23]  Thomas Lukasiewicz,et al.  Tag-Aware Personalized Recommendation Using a Hybrid Deep Model , 2017, IJCAI.

[24]  Cédric Archambeau,et al.  One-Pass Ranking Models for Low-Latency Product Recommendations , 2015, KDD.

[25]  Yehuda Koren,et al.  On the Difficulty of Evaluating Baselines: A Study on Recommender Systems , 2019, ArXiv.

[26]  Walid Krichene,et al.  Neural Collaborative Filtering vs. Matrix Factorization Revisited , 2020, RecSys.

[27]  Martin Ester,et al.  Collaborative Denoising Auto-Encoders for Top-N Recommender Systems , 2016, WSDM.

[28]  Colin Cooper,et al.  Random walks in recommender systems: exact computation and simulations , 2014, WWW.

[29]  Xiaodong He,et al.  A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems , 2015, WWW.

[30]  Craig MacDonald,et al.  A Contextual Attention Recurrent Architecture for Context-Aware Venue Recommendation , 2018, SIGIR.

[31]  Sabine Hossenfelder,et al.  Lost in Math: How Beauty Leads Physics Astray , 2018 .

[32]  Jimmy J. Lin,et al.  The Neural Hype and Comparisons Against Weak Baselines , 2019, SIGIR Forum.

[33]  Sean M. McNee,et al.  Being accurate is not enough: how accuracy metrics have hurt recommender systems , 2006, CHI Extended Abstracts.

[34]  Dietmar Jannach,et al.  Are we really making much progress? A worrying analysis of recent neural recommendation approaches , 2019, RecSys.

[35]  Matthew D. Hoffman,et al.  Variational Autoencoders for Collaborative Filtering , 2018, WWW.

[36]  Christian S. Collberg,et al.  Repeatability in computer systems research , 2016, Commun. ACM.

[37]  Doudou LaLoudouana Data Set Selection , 2002 .

[38]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[39]  Linpeng Huang,et al.  DELF: A Dual-Embedding based Deep Latent Factor Model for Recommendation , 2018, IJCAI.

[40]  Arkadiusz Paterek,et al.  Improving regularized singular value decomposition for collaborative filtering , 2007 .

[41]  Abraham Bernstein,et al.  Updatable, Accurate, Diverse, and Scalable Recommendations for Interactive Applications , 2016, ACM Trans. Interact. Intell. Syst..

[42]  Fabio Stella,et al.  Contrasting Offline and Online Results when Evaluating Recommendation Algorithms , 2016, RecSys.

[43]  Yehuda Koren,et al.  Improved Neighborhood-based Collaborative Filtering , 2007 .

[44]  Xiangnan He,et al.  Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention , 2017, SIGIR.

[45]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[46]  Philip S. Yu,et al.  Leveraging Meta-path based Context for Top- N Recommendation with A Neural Co-Attention Model , 2018, KDD.

[47]  Maurizio Ferrari Dacrema,et al.  Artist-driven layering and user's behaviour impact on recommendations in a playlist continuation scenario , 2018, RecSys Challenge.

[48]  Evangelos Spiliotis,et al.  Statistical and Machine Learning forecasting methods: Concerns and ways forward , 2018, PloS one.

[49]  Franca Garzotto,et al.  Investigating the Persuasion Potential of Recommender Systems from a Quality Perspective: An Empirical Study , 2012, TIIS.

[50]  Alessandro Bozzon,et al.  Recurrent knowledge graph embedding for effective recommendation , 2018, RecSys.

[51]  Dit-Yan Yeung,et al.  Collaborative Deep Learning for Recommender Systems , 2014, KDD.

[52]  Harald Steck,et al.  Embarrassingly Shallow Autoencoders for Sparse Data , 2019, WWW.

[53]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[54]  Shujian Huang,et al.  Deep Matrix Factorization Models for Recommender Systems , 2017, IJCAI.

[55]  Zachary C. Lipton,et al.  Troubling Trends in Machine Learning Scholarship , 2018, ACM Queue.

[56]  James She,et al.  Collaborative Variational Autoencoder for Recommender Systems , 2017, KDD.

[57]  Shlomo Berkovsky,et al.  Collaborative Recommendations - Algorithms, Practical Challenges and Applications , 2018, Collaborative Recommendations.

[58]  Fabio Aiolli,et al.  Efficient top-n recommendation for very large scale binary rated datasets , 2013, RecSys.

[59]  Siu Cheung Hui,et al.  Multi-Pointer Co-Attention Networks for Recommendation , 2018, KDD.

[60]  Jelena Kovacevic,et al.  Reproducible research in signal processing , 2009, IEEE Signal Process. Mag..

[61]  S. C. Hui,et al.  Latent Relational Metric Learning via Memory-based Attention for Collaborative Ranking , 2017, WWW.

[62]  Brian Y. Lim,et al.  RecGAN: recurrent generative adversarial networks for recommendation systems , 2018, RecSys.

[63]  Carl Gutwin,et al.  Threats of a replication crisis in empirical computer science , 2020, Commun. ACM.

[64]  Jimmy J. Lin,et al.  Critically Examining the "Neural Hype": Weak Baselines and the Additivity of Effectiveness Gains from Neural Ranking Models , 2019, SIGIR.

[65]  Dietmar Jannach,et al.  Methodological Issues in Recommender Systems Research (Extended Abstract) , 2020, IJCAI.

[66]  Dietmar Jannach,et al.  Critically Examining the Claimed Value of Convolutions over User-Item Embedding Maps for Recommender Systems , 2020, CIKM.

[67]  Dietmar Jannach,et al.  Evaluation of session-based recommendation algorithms , 2018, User Modeling and User-Adapted Interaction.

[68]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[69]  Alistair Moffat,et al.  Improvements that don't add up: ad-hoc retrieval results since 1998 , 2009, CIKM.

[70]  Stephen E. Robertson,et al.  Probabilistic relevance ranking for collaborative filtering , 2008, Information Retrieval.

[71]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[72]  Tat-Seng Chua,et al.  Neural Collaborative Filtering , 2017, WWW.

[73]  Longbing Cao,et al.  CoupledCF: Learning Explicit and Implicit User-item Couplings in Recommendation for Deep Collaborative Filtering , 2018, IJCAI.

[74]  Juliana Freire,et al.  Reproducibility of Data-Oriented Experiments in e-Science (Dagstuhl Seminar 16041) , 2016, Dagstuhl Reports.

[75]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[76]  Alistair Moffat,et al.  Offline evaluation options for recommender systems , 2020, Information Retrieval Journal.

[77]  Vikram Pudi,et al.  Attentive neural architecture incorporating song features for music recommendation , 2018, RecSys.

[78]  Roberto Turrin,et al.  Performance of recommender algorithms on top-n recommendation tasks , 2010, RecSys '10.

[79]  A. Tversky Features of Similarity , 1977 .