Economic Evaluation of Recommender Systems: A Proposal

The evaluation of information retrieval effectiveness by using fewer topics / queries has been studied for some years now: this approach potentially allows to save resources without sacrificing evaluation reliability. We propose to apply it to the evaluation of recommender systems. We describe our proposal and detail what is needed to put it in practice.

[1]  K. Sparck Jones,et al.  INFORMATION RETRIEVAL TEST COLLECTIONS , 1976 .

[2]  Eddy Maddalena,et al.  Do Easy Topics Predict Effectiveness Better Than Difficult Topics? , 2017, ECIR.

[3]  Tetsuya Sakai,et al.  Designing Test Collections for Comparing Many Systems , 2014, CIKM.

[4]  Stephen E. Robertson,et al.  On the Contributions of Topics to System Evaluation , 2011, ECIR.

[5]  Ellen M. Voorhees,et al.  Evaluating Evaluation Measure Stability , 2000, SIGIR 2000.

[6]  Alistair Moffat,et al.  Statistical power in retrieval experimentation , 2008, CIKM '08.

[7]  Stephen E. Robertson,et al.  On Using Fewer Topics in Information Retrieval Evaluations , 2013, ICTIR.

[8]  Jöran Beel,et al.  Real-World Recommender Systems for Academia: The Pain and Gain in Building, Operating, and Researching them , 2017, BIR@ECIR.

[9]  Stefano Mizzaro,et al.  Improving the Efficiency of Retrieval Effectiveness Evaluation: Finding a Few Good Topics with Clustering? , 2016, IIR.

[10]  Stephen E. Robertson,et al.  A few good topics: Experiments in topic set reduction for retrieval evaluation , 2009, TOIS.

[11]  Guy Shani,et al.  A Survey of Accuracy Evaluation Metrics of Recommendation Tasks , 2009, J. Mach. Learn. Res..

[12]  Justin Zobel,et al.  How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.

[13]  Jon Atle Gulla,et al.  Data Sets and News Recommendation , 2014, UMAP Workshops.

[14]  Tetsuya Sakai,et al.  Topic set size design , 2015, Information Retrieval Journal.

[15]  Stephen E. Robertson,et al.  Hits hits TREC: exploring IR evaluation results with network analysis , 2007, SIGIR.