More Is Less: When Do Recommenders Underperform for Data-rich Users?

Users of recommender systems tend to differ in their level of interaction with these algorithms, which may affect the quality of recommendations they receive and lead to undesirable performance disparity. In this paper we investigate under what conditions the performance for data-rich and data-poor users diverges for a collection of popular evaluation metrics applied to ten benchmark datasets. We find that Precision is consistently higher for data-rich users across all the datasets; Mean Average Precision is comparable across user groups but its variance is large; Recall yields a counter-intuitive result where the algorithm performs better for data-poor than for data-rich users, which bias is further exacerbated when negative item sampling is employed during evaluation. The final observation suggests that as users interact more with recommender systems, the quality of recommendations they receive degrades (when measured by Recall). Our insights clearly show the importance of an evaluation protocol and its influence on the reported results when studying recommender systems.

[1]  Wayne Xin Zhao,et al.  A Revisiting Study of Appropriate Offline Evaluation for Top-N Recommendation Algorithms , 2022, ACM Trans. Inf. Syst..

[2]  Aixin Sun,et al.  Do Loyal Users Enjoy Better Recommendations?: Understanding Recommender Accuracy from a Time Perspective , 2022, ICTIR.

[3]  Jin Yao Chin,et al.  The Datasets Dilemma: How Much Do We Really Know About Recommendation Datasets? , 2022, WSDM.

[4]  Yingqiang Ge,et al.  User-oriented Fairness in Recommendation , 2021, WWW.

[5]  Steffen Rendle,et al.  Item Recommendation from Implicit Feedback , 2021, Recommender Systems Handbook.

[6]  Ji-Rong Wen,et al.  RecBole: Towards a Unified, Comprehensive and Efficient Framework for Recommendation Algorithms , 2020, CIKM.

[7]  Jie Yang,et al.  Are We Evaluating Rigorously? Benchmarking Recommendation for Reproducible Evaluation and Fair Comparison , 2020, RecSys.

[8]  Pablo Castells,et al.  On Target Item Sampling in Offline Recommender System Evaluation , 2020, RecSys.

[9]  Walid Krichene,et al.  On Sampled Metrics for Item Recommendation , 2020, KDD.

[10]  Ruoming Jin,et al.  On Sampling Top-K Recommendation Evaluation , 2020, KDD.

[11]  Xiangnan He,et al.  LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation , 2020, SIGIR.

[12]  Harald Steck,et al.  ADMM SLIM: Sparse Recommendations for Many Users , 2020, WSDM.

[13]  Alejandro Bellogín,et al.  On the robustness and discriminative power of information retrieval metrics for top-N recommendation , 2018, RecSys.

[14]  Matthew D. Hoffman,et al.  Variational Autoencoders for Collaborative Filtering , 2018, WWW.

[15]  Alejandro Bellogín,et al.  Statistical biases in Information Retrieval metrics for recommender systems , 2017, Information Retrieval Journal.

[16]  Tat-Seng Chua,et al.  Neural Collaborative Filtering , 2017, WWW.

[17]  Julian J. McAuley,et al.  Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering , 2016, WWW.

[18]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[19]  Tat-Seng Chua,et al.  Learning Image and User Features for Recommendation in Social Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[21]  J. Konstan,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[22]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[23]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.