Revisiting Alternative Experimental Settings for Evaluating Top-N Item Recommendation Algorithms

Top-N item recommendation has been a widely studied task from implicit feedback. Although much progress has been made with neural methods, there is increasing concern on appropriate evaluation of recommendation algorithms. In this paper, we revisit alternative experimental settings for evaluating top-N recommendation algorithms, considering three important factors, namely dataset splitting, sampled metrics and domain selection. We select eight representative recommendation algorithms (covering both traditional and neural methods) and construct extensive experiments on a very large dataset. By carefully revisiting different options, we make several important findings on the three factors, which directly provide useful suggestions on how to appropriately set up the experiments for top-N item recommendation.

[1]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[2]  Max Welling,et al.  Graph Convolutional Matrix Completion , 2017, ArXiv.

[3]  Yiqun Liu,et al.  How good your recommender system is? A survey on evaluations in recommendation , 2017, International Journal of Machine Learning and Cybernetics.

[4]  Guorui Zhou,et al.  Deep Interest Network for Click-Through Rate Prediction , 2017, KDD.

[5]  Alan Said,et al.  Comparative recommender system evaluation: benchmarking recommendation frameworks , 2014, RecSys '14.

[6]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[7]  Alejandro Bellogín,et al.  Precision-oriented evaluation of recommender systems: an algorithmic comparison , 2011, RecSys '11.

[8]  Ju Ren,et al.  A Survey on End-Edge-Cloud Orchestrated Network Computing Paradigms , 2019, ACM Comput. Surv..

[9]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[10]  Walid Krichene,et al.  On Sampled Metrics for Item Recommendation , 2020, KDD.

[11]  Yi Tay,et al.  Deep Learning based Recommender System: A Survey and New Perspectives , 2018 .

[12]  Tat-Seng Chua,et al.  Neural Collaborative Filtering , 2017, WWW.

[13]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[14]  Harald Steck,et al.  Evaluation of recommendations: rating-prediction and ranking , 2013, RecSys.

[15]  Julian J. McAuley,et al.  Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering , 2016, WWW.