Reproduce. Generalize. Extend. On Information Retrieval Evaluation without Relevance Judgments
暂无分享,去创建一个
Stefano Mizzaro | Giuseppe Serra | Marco Passon | Kevin Roitero | G. Serra | Kevin Roitero | Marco Passon | Stefano Mizzaro
[1] Rabia Nuray-Turan,et al. Automatic ranking of retrieval systems in imperfect environments , 2003, SIGIR '03.
[2] Norbert Fuhr,et al. Some Common Mistakes In IR Evaluation, And How They Can Be Avoided , 2018, SIGIR Forum.
[3] Stephen E. Robertson,et al. On the Contributions of Topics to System Evaluation , 2011, ECIR.
[4] Alistair Moffat,et al. A similarity measure for indefinite rankings , 2010, TOIS.
[5] Elad Yom-Tov,et al. Estimating the query difficulty for information retrieval , 2010, Synthesis Lectures on Information Concepts, Retrieval, and Services.
[6] O. J. Dunn. Multiple Comparisons among Means , 1961 .
[7] Emine Yilmaz,et al. Estimating average precision with incomplete and imperfect judgments , 2006, CIKM '06.
[8] Ellen M. Voorhees,et al. Overview of the TREC 2004 Robust Track. , 2004 .
[9] Justin Zobel,et al. How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.
[10] Shariq Bashir. Combining pre-retrieval query quality predictors using genetic programming , 2013, Applied Intelligence.
[11] Chris Buckley,et al. Topic prediction based on comparative retrieval rankings , 2004, SIGIR '04.
[12] Stephen E. Robertson,et al. On Using Fewer Topics in Information Retrieval Evaluations , 2013, ICTIR.
[13] Noriko Kando,et al. Increasing Reproducibility in IR: Findings from the Dagstuhl Seminar on "Reproducibility of Data-Oriented Experiments in e-Science" , 2016, SIGIR Forum.
[14] Ben Carterette,et al. The effect of assessor error on IR system evaluation , 2010, SIGIR.
[15] J. Shane Culpepper,et al. The effect of pooling and evaluation depth on IR metrics , 2016, Information Retrieval Journal.
[16] Stephen E. Robertson,et al. On GMAP: and other transformations , 2006, CIKM '06.
[17] Josiane Mothe,et al. Human-Based Query Difficulty Prediction , 2017, ECIR.
[18] Nicola Ferro,et al. Reproducibility Challenges in Information Retrieval Evaluation , 2017, ACM J. Data Inf. Qual..
[19] Stephen E. Robertson,et al. A new rank correlation coefficient for information retrieval , 2008, SIGIR '08.
[20] Jong-Hak Lee,et al. Analyses of multiple evidence combination , 1997, SIGIR '97.
[21] Javed A. Aslam,et al. On the effectiveness of evaluating retrieval systems in the absence of relevance judgments , 2003, SIGIR.
[22] Shengli Wu,et al. Data fusion with estimated weights , 2002, CIKM '02.
[23] Shengli Wu,et al. Methods for ranking information retrieval systems without relevance judgments , 2003, SAC '03.
[24] Josiane Mothe,et al. Linguistic features to predict query difficulty , 2005, SIGIR 2005.
[25] Josiane Mothe,et al. Why do you Think this Query is Difficult?: A User Study on Human Query Prediction , 2016, SIGIR.
[26] Donna K. Harman,et al. Overview of the Eighth Text REtrieval Conference (TREC-8) , 1999, TREC.
[27] Stephen E. Robertson,et al. Hits hits TREC: exploring IR evaluation results with network analysis , 2007, SIGIR.
[28] Franciska de Jong,et al. Retrieval system evaluation: automatic evaluation versus incomplete judgments , 2010, SIGIR '10.
[29] Karen Spärck Jones. A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.
[30] Allan Hanbury,et al. The Impact of Fixed-Cost Pooling Strategies on Test Collection Bias , 2016, ICTIR.
[31] Falk Scholer,et al. Effective Pre-retrieval Query Performance Prediction Using Similarity and Variability Evidence , 2008, ECIR.
[32] Charles L. A. Clarke,et al. The TREC 2006 Terabyte Track , 2006, TREC.
[33] David E. Losada,et al. Multi-armed bandits for adjudicating documents in pooling-based evaluation of information retrieval systems , 2017, Inf. Process. Manag..
[34] Ellen M. Voorhees,et al. The effect of topic set size on retrieval experiment error , 2002, SIGIR '02.
[35] Albert N. Link,et al. Economic impact assessment of NIST's text REtrieval conference (TREC) program. Final report , 2010 .
[36] Oren Kurland,et al. Predicting Query Performance by Query-Drift Estimation , 2009, ICTIR.
[37] Oren Kurland,et al. Predicting Query Performance by Query-Drift Estimation , 2009, TOIS.
[38] Cyril W. Cleverdon,et al. The significance of the Cranfield tests on index languages , 1991, SIGIR '91.
[39] Anselm Spoerri,et al. How the overlap between the search results of different retrieval systems correlates with document relevance , 2006, ASIST.
[40] Donna K. Harman,et al. Overview of the Reliable Information Access Workshop , 2009, Information Retrieval.
[41] Eddy Maddalena,et al. Do Easy Topics Predict Effectiveness Better Than Difficult Topics? , 2017, ECIR.
[42] Peter Bailey,et al. Tasks, Queries, and Rankers in Pre-Retrieval Performance Prediction , 2017, ADCS.
[43] Donna K. Harman,et al. The NRRC reliable information access (RIA) workshop , 2004, SIGIR '04.
[44] Djoerd Hiemstra,et al. A survey of pre-retrieval query performance predictors , 2008, CIKM '08.
[45] Ian Soboroff,et al. Ranking retrieval systems without relevance judgments , 2001, SIGIR '01.
[46] and software — performance evaluation , .
[47] Anselm Spoerri,et al. Using the structure of overlap between search results to rank retrieval systems without relevance judgments , 2007, Inf. Process. Manag..
[48] Stephen E. Robertson,et al. A few good topics: Experiments in topic set reduction for retrieval evaluation , 2009, TOIS.