Fewer topics? A million topics? Both?! On topics subsets in test collections
暂无分享,去创建一个
J. Shane Culpepper | Mark Sanderson | Falk Scholer | Stefano Mizzaro | Kevin Roitero | M. Sanderson | J. Culpepper | Kevin Roitero | Falk Scholer | Stefano Mizzaro
[1] Alistair Moffat,et al. Statistical power in retrieval experimentation , 2008, CIKM '08.
[2] David J. Sheskin,et al. Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .
[3] Stefano Mizzaro,et al. IR Evaluation without a Common Set of Topics , 2009, ICTIR.
[4] Ingemar J. Cox,et al. Selecting a Subset of Queries for Acquisition of Further Relevance Judgements , 2011, ICTIR.
[5] Ellen M. Voorhees,et al. Evaluating Evaluation Measure Stability , 2000, SIGIR 2000.
[6] Julián Urbano,et al. Stochastic Simulation of Test Collections: Evaluation Scores , 2018, SIGIR.
[7] James Allan,et al. Minimal test collections for retrieval evaluation , 2006, SIGIR.
[8] James Allan,et al. If I Had a Million Queries , 2009, ECIR.
[9] Stefano Mizzaro,et al. Effectiveness Evaluation with a Subset of Topics: A Practical Approach , 2018, SIGIR.
[10] Julián Urbano,et al. Test collection reliability: a study of bias and robustness to statistical assumptions via stochastic simulation , 2016, Information Retrieval Journal.
[11] Mark Sanderson,et al. Problems with Kendall's tau , 2007, SIGIR.
[12] Mónica Marrero,et al. On the measurement of test collection reliability , 2013, SIGIR.
[13] Mark Sanderson,et al. Information retrieval system evaluation: effort, sensitivity, and reliability , 2005, SIGIR '05.
[14] Stefano Mizzaro,et al. Reproduce and Improve , 2018, ACM J. Data Inf. Qual..
[15] Stephen E. Robertson,et al. On the Contributions of Topics to System Evaluation , 2011, ECIR.
[16] Ellen M. Voorhees,et al. The effect of topic set size on retrieval experiment error , 2002, SIGIR '02.
[17] Pu Li,et al. Test theory for assessing IR test collections , 2007, SIGIR.
[18] Ben Carterette,et al. Multiple testing in statistical analysis of systems-based information retrieval experiments , 2012, TOIS.
[19] James E. Bartlett,et al. Organizational research: Determining appropriate sample size in survey research , 2001 .
[20] Stephen E. Robertson,et al. A few good topics: Experiments in topic set reduction for retrieval evaluation , 2009, TOIS.
[21] Djoerd Hiemstra,et al. Relying on topic subsets for system ranking estimation , 2009, CIKM.
[22] Tetsuya Sakai,et al. Alternatives to Bpref , 2007, SIGIR.
[23] Emine Yilmaz,et al. Representative & Informative Query Selection for Learning to Rank using Submodular Functions , 2015, SIGIR.
[24] Alistair Moffat,et al. Models and metrics: IR evaluation as a user process , 2012, ADCS.
[25] Ingemar J. Cox,et al. Prioritizing relevance judgments to improve the construction of IR test collections , 2011, CIKM '11.
[26] Ben Carterette,et al. Million Query Track 2007 Overview , 2008, TREC.
[27] Ben Carterette,et al. Hypothesis testing with incomplete relevance judgments , 2007, CIKM '07.
[28] Stephen E. Robertson,et al. Hits hits TREC: exploring IR evaluation results with network analysis , 2007, SIGIR.
[29] Tetsuya Sakai,et al. Statistical Significance, Power, and Sample Sizes: A Systematic Review of SIGIR and TOIS, 2006-2015 , 2016, SIGIR.
[30] Tetsuya Sakai,et al. Designing Test Collections for Comparing Many Systems , 2014, CIKM.
[31] Eddy Maddalena,et al. Do Easy Topics Predict Effectiveness Better Than Difficult Topics? , 2017, ECIR.
[32] Tamer Elsayed,et al. Intelligent topic selection for low-cost information retrieval evaluation: A New perspective on deep vs. shallow judging , 2017, Inf. Process. Manag..
[33] Milad Shokouhi,et al. An uncertainty-aware query selection model for evaluation of IR systems , 2012, SIGIR '12.
[34] Djoerd Hiemstra,et al. A Case for Automatic System Evaluation , 2010, ECIR.
[35] Tetsuya Sakai,et al. Topic set size design , 2015, Information Retrieval Journal.
[36] Stephen E. Robertson,et al. On Using Fewer Topics in Information Retrieval Evaluations , 2013, ICTIR.
[37] Daniel E. Rose,et al. Understanding user goals in web search , 2004, WWW '04.
[38] R. Feise. Do multiple outcome measures require p-value adjustment? , 2002, BMC medical research methodology.
[39] Anand Rajaraman,et al. Mining of Massive Datasets , 2011 .