Group Validation in Recommender Systems: Framework for Multi-layer Performance Evaluation

Evaluation of recommendation systems continues evolving, especially in recent years. There have been several attempts to standardize the assessment processes and propose replacement metrics better oriented toward measuring effective personalization. However, standard evaluation tools merely possess the capacity to provide a general overview of a system’s performance; they lack consistency and effectiveness in their use, as evidenced by most recent studies on the topic. Furthermore, traditional evaluation techniques fail to detect potentially harmful data on small subsets. Moreover, they generally lack explainable features to interpret how such minor variations could affect the system’s performance. This proposal focuses on data clustering for recommender evaluation and applies a cluster assessment technique to locate such performance issues. Our new approach, named group validation , aids in spotting critical performance variability in compact subsets of the system’s data and unravels hidden weaknesses in predictions where such unfavorable variations generally go unnoticed with typical assessment methods. Group validation for recommenders is a modular evaluation layer that complements regular evaluation and includes a new unique perspective to the evaluation process. Additionally, it allows several applications to the recommender ecosystem, such as model evolution tests, fraud/attack detection, and the capacity for hosting a hybrid model setup.

[1]  M. Dutta,et al.  A systematic review and research perspective on recommender systems , 2022, Journal of Big Data.

[2]  Jin Yao Chin,et al.  The Datasets Dilemma: How Much Do We Really Know About Recommendation Datasets? , 2022, WSDM.

[3]  Elena Zheleva,et al.  RGRecSys: A Toolkit for Robustness Evaluation of Recommender Systems , 2022, WSDM.

[4]  Cristina Ioana Muntean,et al.  Recommender Systems , 2021, Encyclopedia of Machine Learning.

[5]  Junmin Liu,et al.  AND: Effective Coupling of Accuracy, Novelty and Diversity in the Recommender System , 2021, 2021 17th International Conference on Mobility, Sensing and Networking (MSN).

[6]  Rinchin Damdinov,et al.  Quality Metrics in Recommender Systems: Do We Calculate Metrics Consistently? , 2021, RecSys.

[7]  Antonella De Angeli,et al.  Challenges for Recommender Systems Evaluation , 2021, CHItaly.

[8]  Jacques Demerjian,et al.  Critique on Natural Noise in Recommender Systems , 2021, ACM Trans. Knowl. Discov. Data.

[9]  Iadh Ounis,et al.  The Simpson’s Paradox in the Offline Evaluation of Recommendation Systems , 2021, ACM Trans. Inf. Syst..

[10]  Tommaso Di Noia,et al.  Elliot: A Comprehensive and Rigorous Framework for Reproducible Recommender Systems Evaluation , 2021, SIGIR.

[11]  Alan Said,et al.  Improving accountability in recommender systems research through reproducibility , 2021, User Modeling and User-Adapted Interaction.

[12]  Ji-Rong Wen,et al.  RecBole: Towards a Unified, Comprehensive and Efficient Framework for Recommendation Algorithms , 2020, CIKM.

[13]  Jie Yang,et al.  Are We Evaluating Rigorously? Benchmarking Recommendation for Reproducible Evaluation and Fair Comparison , 2020, RecSys.

[14]  Nicolas Hug,et al.  Surprise: A Python library for recommender systems , 2020, J. Open Source Softw..

[15]  Kok-Leong Ong,et al.  Fraud detection: A systematic literature review of graph-based anomaly detection approaches , 2020, Decis. Support Syst..

[16]  Andreas Argyriou,et al.  Microsoft Recommenders: Best Practices for Production-Ready Recommendation Systems , 2020, WWW.

[17]  Li Chen,et al.  How Serendipity Improves User Satisfaction with Recommendations? A Large-Scale User Evaluation , 2019, WWW.

[18]  Sivaramakrishnan Natarajan,et al.  Enhancing recommendation stability of collaborative filtering recommender system through bio-inspired clustering ensemble method , 2018, Neural Computing and Applications.

[19]  Alejandro Bellogín,et al.  On the robustness and discriminative power of information retrieval metrics for top-N recommendation , 2018, RecSys.

[20]  Tim Kraska,et al.  Automated Data Slicing for Model Validation: A Big Data - AI Integration Approach , 2018, IEEE Transactions on Knowledge and Data Engineering.

[21]  Tim Kraska,et al.  Slice Finder: Automated Data Slicing for Model Validation , 2018, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[22]  Gediminas Adomavicius,et al.  Explicit or implicit feedback? engagement or satisfaction?: a field experiment on machine-learning-based recommender systems , 2018, SAC.

[23]  Thorsten Joachims,et al.  Fairness of Exposure in Rankings , 2018, KDD.

[24]  Yong Yu,et al.  Collaborative Filtering with Graph-Based Implicit Feedback , 2017, 2017 International Conference on Computer Technology, Electronics and Communication (ICCTEC).

[25]  Loren Terveen,et al.  User Personality and User Satisfaction with Recommender Systems , 2017, Information Systems Frontiers.

[26]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[27]  Luis Martínez-López,et al.  Correcting noisy ratings in collaborative recommender systems , 2015, Knowl. Based Syst..

[28]  MengChu Zhou,et al.  An Efficient Non-Negative Matrix-Factorization-Based Approach to Collaborative Filtering for Recommender Systems , 2014, IEEE Transactions on Industrial Informatics.

[29]  Martin Wattenberg,et al.  Ad click prediction: a view from the trenches , 2013, KDD.

[30]  Daniel B. Neill,et al.  Fast subset scan for spatial pattern detection , 2012 .

[31]  Evaggelia Pitoura,et al.  Search result diversification , 2010, SGMD.

[32]  D. Sculley,et al.  Web-scale k-means clustering , 2010, WWW '10.

[33]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[34]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[35]  Daniel Lemire,et al.  Slope One Predictors for Online Rating-Based Collaborative Filtering , 2007, SDM.

[36]  ChengXiang Zhai,et al.  An exploration of axiomatic approaches to information retrieval , 2005, SIGIR '05.

[37]  Tim Kraska,et al.  Slice Finder: Automated Data Slicing for Model Interpretability , 2017 .

[38]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.