Statistical Significance Testing in Information Retrieval: Theory and Practice
暂无分享,去创建一个
[1] J. Berger. Could Fisher, Jeffreys and Neyman Have Agreed on Testing? , 2003 .
[2] Ben Carterette,et al. Hypothesis testing with incomplete relevance judgments , 2007, CIKM '07.
[3] Regina Nuzzo,et al. Scientific method: Statistical errors , 2014, Nature.
[4] Ellen M. Voorhees,et al. Variations in relevance judgments and the measurement of retrieval effectiveness , 1998, SIGIR '98.
[5] Jean Tague-Sutcliffe,et al. The Pragmatics of Information Retrieval Experimentation Revisited , 1997, Inf. Process. Manag..
[6] Alistair Moffat,et al. Statistical power in retrieval experimentation , 2008, CIKM '08.
[7] Mark Sanderson,et al. Test Collection Based Evaluation of Information Retrieval Systems , 2010, Found. Trends Inf. Retr..
[8] Ellen M. Voorhees,et al. The effect of topic set size on retrieval experiment error , 2002, SIGIR '02.
[9] J. Ioannidis. Why Most Published Research Findings Are False , 2005, PLoS medicine.
[10] Leonid Boytsov,et al. Deciding on an adjustment for multiplicity in IR experiments , 2013, SIGIR.
[11] Justin Zobel,et al. How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.
[12] Alistair Moffat,et al. What Does It Mean to "Measure Performance"? , 2004, WISE.
[13] Mark Sanderson,et al. Information retrieval system evaluation: effort, sensitivity, and reliability , 2005, SIGIR '05.
[14] Ben Carterette,et al. Reusable test collections through experimental design , 2010, SIGIR.
[15] Gobinda G. Chowdhury,et al. TREC: Experiment and Evaluation in Information Retrieval , 2007 .
[16] Ellen M. Voorhees,et al. TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing) , 2005 .
[17] Ben Carterette. Model-Based Inference about IR Systems , 2011, ICTIR.
[18] M. Artés. Statistical errors. , 1977, Medicina clinica.
[19] Alistair Moffat,et al. Improvements that don't add up: ad-hoc retrieval results since 1998 , 2009, CIKM.
[20] Ben Carterette,et al. Simulating simple user behavior for system effectiveness evaluation , 2011, CIKM '11.
[21] Ben Carterette,et al. Multiple testing in statistical analysis of systems-based information retrieval experiments , 2012, TOIS.
[22] James Blustein,et al. A Statistical Analysis of the TREC-3 Data , 1995, TREC.
[23] Mónica Marrero,et al. A comparison of the optimality of statistical significance tests for information retrieval evaluation , 2013, SIGIR.
[24] Douglas H. Johnson. The Insignificance of Statistical Significance Testing , 1999 .
[25] Ben Carterette,et al. Incorporating variability in user behavior into systems based evaluation , 2012, CIKM.
[26] Gordon V. Cormack,et al. Statistical precision of information retrieval evaluation , 2006, SIGIR.
[27] James Allan,et al. A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.
[28] Peter Willett,et al. Readings in information retrieval , 1997 .
[29] J Allan,et al. Readings in information retrieval. , 1998 .
[30] J. Ioannidis. Contradicted and initially stronger effects in highly cited clinical research. , 2005, JAMA.
[31] James Allan,et al. Agreement among statistical significance tests for information retrieval evaluation at varying sample sizes , 2009, SIGIR.