On the discriminative power of hyper-parameters in cross-validation and how to choose them

Hyper-parameters tuning is a crucial task to make a model perform at its best. However, despite the well-established methodologies, some aspects of the tuning remain unexplored. As an example, it may affect not just accuracy but also novelty as well as it may depend on the adopted dataset. Moreover, sometimes it could be sufficient to concentrate on a single parameter only (or a few of them) instead of their overall set. In this paper we report on our investigation on hyper-parameters tuning by performing an extensive 10-Folds Cross-Validation on MovieLens and Amazon Movies for three well-known baselines: User-kNN, Item-kNN, BPR-MF. We adopted a grid search strategy considering approximately 15 values for each parameter, and we then evaluated each combination of parameters in terms of accuracy and novelty. We investigated the discriminative power of nDCG, Precision, Recall, MRR, EFD, EPC, and, finally, we analyzed the role of parameters on model evaluation for Cross-Validation.

[1]  Robert A. Legenstein,et al.  Combining predictions for accurate recommender systems , 2010, KDD.

[2]  David D. Cox,et al.  A High-Throughput Screening Approach to Discovering Good Forms of Biologically Inspired Visual Representation , 2009, PLoS Comput. Biol..

[3]  John R. Anderson,et al.  Beyond Globally Optimal: Focused Learning for Improved Recommendations , 2017, WWW.

[4]  John Riedl,et al.  Analysis of recommendation algorithms for e-commerce , 2000, EC '00.

[5]  Saul Vargas,et al.  Rank and relevance in novelty and diversity metrics for recommender systems , 2011, RecSys '11.

[6]  Bruce G. Lindsay,et al.  Random sampling techniques for space efficient online computation of order statistics of large datasets , 1999, SIGMOD '99.

[7]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[8]  Alejandro Bellogín,et al.  On the robustness and discriminative power of information retrieval metrics for top-N recommendation , 2018, RecSys.

[9]  Chris Eliasmith,et al.  Hyperopt: a Python library for model selection and hyperparameter optimization , 2015 .

[10]  Christian Igel,et al.  Evolutionary tuning of multiple SVM parameters , 2005, ESANN.

[11]  M. Matteucci,et al.  An Evaluation Methodology for Collaborative Recommender Systems , 2008, 2008 International Conference on Automated Solutions for Cross Media Content and Multi-Channel Distribution.

[12]  Guy Shani,et al.  Evaluating Recommendation Systems , 2011, Recommender Systems Handbook.

[13]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Multiclass SVM Model Selection Using Particle Swarm Optimization , 2006, 2006 Sixth International Conference on Hybrid Intelligent Systems (HIS'06).

[14]  Harald Steck,et al.  Evaluation of recommendations: rating-prediction and ranking , 2013, RecSys.

[15]  G LindsayBruce,et al.  Random sampling techniques for space efficient online computation of order statistics of large datasets , 1999 .

[16]  Saul Vargas,et al.  Novelty and Diversity in Recommender Systems , 2015, Recommender Systems Handbook.

[17]  Alejandro Bellogín,et al.  Revisiting Neighbourhood-Based Recommenders For Temporal Scenarios , 2017, RecTemp@RecSys.

[18]  Tony R. Martinez,et al.  Recommending Learning Algorithms and Their Associated Hyperparameters , 2014, MetaSel@ECAI.

[19]  Alistair Moffat,et al.  The Effect of Pooling and Evaluation Depth on Metric Stability , 2010, EVIA@NTCIR.

[20]  David D. Cox,et al.  Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms , 2013, SciPy.

[21]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[22]  Iván Cantador,et al.  Time-aware recommender systems: a comprehensive survey and analysis of existing evaluation protocols , 2013, User Modeling and User-Adapted Interaction.

[23]  Franca Garzotto,et al.  Looking for "Good" Recommendations: A Comparative Evaluation of Recommender Systems , 2011, INTERACT.

[24]  MengChu Zhou,et al.  A Nonnegative Latent Factor Model for Large-Scale Sparse Matrices in Recommender Systems via Alternating Direction Method , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Tommaso Di Noia,et al.  Local Popularity and Time in top-N Recommendation , 2018, ECIR.

[26]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[27]  Dennis M. Wilkinson,et al.  Large-Scale Parallel Collaborative Filtering for the Netflix Prize , 2008, AAIM.

[28]  Domonkos Tikk,et al.  Matrix factorization and neighbor based algorithms for the netflix prize problem , 2008, RecSys '08.

[29]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[30]  Licia Capra,et al.  Continuous hyperparameter optimization for large-scale recommender systems , 2013, 2013 IEEE International Conference on Big Data.

[31]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[32]  J. Shane Culpepper,et al.  The effect of pooling and evaluation depth on IR metrics , 2016, Information Retrieval Journal.

[33]  Jure Leskovec,et al.  From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews , 2013, WWW.

[34]  Sean M. McNee,et al.  Being accurate is not enough: how accuracy metrics have hurt recommender systems , 2006, CHI Extended Abstracts.

[35]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[36]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[37]  James Bennett,et al.  The Netflix Prize , 2007 .

[38]  Tetsuya Sakai,et al.  Evaluating evaluation metrics based on the bootstrap , 2006, SIGIR.

[39]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[40]  Tommaso Di Noia,et al.  The importance of being dissimilar in recommendation , 2018, SAC.

[41]  Lars Schmidt-Thieme,et al.  Beyond Manual Tuning of Hyperparameters , 2015, KI - Künstliche Intelligenz.

[42]  Qiang Yang,et al.  EigenRank: a ranking-oriented approach to collaborative filtering , 2008, SIGIR '08.

[43]  Domonkos Tikk,et al.  Scalable Collaborative Filtering Approaches for Large Recommender Systems , 2009, J. Mach. Learn. Res..

[44]  Kevin Leyton-Brown,et al.  An Efficient Approach for Assessing Hyperparameter Importance , 2014, ICML.

[45]  Neil J. Hurley,et al.  Novelty and Diversity in Top-N Recommendation -- Analysis and Evaluation , 2011, TOIT.

[46]  Roberto Turrin,et al.  Performance of recommender algorithms on top-n recommendation tasks , 2010, RecSys '10.