Methodological Issues in Recommender Systems Research (Extended Abstract)

The development of continuously improved machine learning algorithms for personalized item ranking lies at the core of today’s research in the area of recommender systems. Over the years, the research community has developed widelyagreed best practices for comparing algorithms and demonstrating progress with offline experiments. Unfortunately, we find this accepted research practice can easily lead to phantom progress due to the following reasons: limited reproducibility, comparison with complex but weak and nonoptimized baseline algorithms, over-generalization from a small set of experimental configurations. To assess the extent of such problems, we analyzed 18 research papers published recently at top-ranked conferences. Only 7 were reproducible with reasonable effort, and 6 of them could often be outperformed by relatively simple heuristic methods, e.g., nearest neighbors. In this paper, we discuss these observations in detail, and reflect on the related fundamental problem of over-reliance on offline experiments in recommender systems research.

[1]  Daniel Gooch,et al.  Communications of the ACM , 2011, XRDS.

[2]  Dietmar Jannach,et al.  A Troubling Analysis of Reproducibility and Progress in Recommender Systems Research , 2019, ACM Trans. Inf. Syst..

[3]  Philip S. Yu,et al.  Leveraging Meta-path based Context for Top- N Recommendation with A Neural Co-Attention Model , 2018, KDD.

[4]  Roberto Turrin,et al.  Performance of recommender algorithms on top-n recommendation tasks , 2010, RecSys '10.

[5]  DIMITRIOS PIERRAKOS,et al.  User Modeling and User-Adapted Interaction , 1994, User Modeling and User-Adapted Interaction.

[6]  James She,et al.  Collaborative Variational Autoencoder for Recommender Systems , 2017, KDD.

[7]  Lior Rokach,et al.  Introduction to Recommender Systems Handbook , 2011, Recommender Systems Handbook.

[8]  Bin Shen,et al.  Collaborative Memory Network for Recommendation Systems , 2018, SIGIR.

[9]  R. Nickerson Confirmation Bias: A Ubiquitous Phenomenon in Many Guises , 1998 .

[10]  Dietmar Jannach,et al.  Are we really making much progress? A worrying analysis of recent neural recommendation approaches , 2019, RecSys.

[11]  Evangelos Spiliotis,et al.  Statistical and Machine Learning forecasting methods: Concerns and ways forward , 2018, PloS one.

[12]  Alistair Moffat,et al.  Improvements that don't add up: ad-hoc retrieval results since 1998 , 2009, CIKM.

[13]  Noemi Mauro,et al.  Performance comparison of neural and non-neural approaches to session-based recommendation , 2019, RecSys.

[14]  Franca Garzotto,et al.  Investigating the Persuasion Potential of Recommender Systems from a Quality Perspective: An Empirical Study , 2012, TIIS.

[15]  Zachary C. Lipton,et al.  Troubling Trends in Machine Learning Scholarship , 2018, ACM Queue.

[16]  Boi Faltings,et al.  Predicting Online Performance of News Recommender Systems Through Richer Evaluation Metrics , 2015, RecSys.

[17]  Dietmar Jannach,et al.  Measuring the Business Value of Recommender Systems , 2019, ACM Trans. Manag. Inf. Syst..

[18]  Yehuda Koren,et al.  On the Difficulty of Evaluating Baselines: A Study on Recommender Systems , 2019, ArXiv.

[19]  Colin Cooper,et al.  Random walks in recommender systems: exact computation and simulations , 2014, WWW.

[20]  Dit-Yan Yeung,et al.  Collaborative Deep Learning for Recommender Systems , 2014, KDD.

[21]  Mouzhi Ge,et al.  Recommender Systems in Computer Science and Information Systems-a Landscape of Research , 2012 .