How do Metric Score Distributions affect the Type I Error Rate of Statistical Significance Tests in Information Retrieval?
暂无分享,去创建一个
[1] A. C. Berry. The accuracy of the Gaussian approximation to the sum of independent variates , 1941 .
[2] James Allan,et al. A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.
[3] Jacques Savoy,et al. Statistical inference in retrieval effectiveness evaluation , 1997, Inf. Process. Manag..
[4] W. John Wilbur,et al. Non-parametric significance tests of retrieval performance comparisons , 1994, J. Inf. Sci..
[5] Margaret J. Robertson,et al. Design and Analysis of Experiments , 2006, Handbook of statistics.
[6] Tetsuya Sakai,et al. Statistical Significance, Power, and Sample Sizes: A Systematic Review of SIGIR and TOIS, 2006-2015 , 2016, SIGIR.
[7] Wilkie W. Chaffin,et al. The effect of skewness and kurtosis on the one-sample T test and the impact of knowledge of the population standard deviation , 1993 .
[8] David E. Losada,et al. Testing the tests: simulation of rankings to compare statistical significance tests in information retrieval evaluation , 2021, SAC.
[9] David E. Losada,et al. Using score distributions to compare statistical significance tests for information retrieval evaluation , 2019, J. Assoc. Inf. Sci. Technol..
[10] R. Blair,et al. A more realistic look at the robustness and Type II error properties of the t test to departures from population normality. , 1992 .
[11] B. Efron. Nonparametric standard errors and confidence intervals , 1981 .
[12] Julián Urbano,et al. Test collection reliability: a study of bias and robustness to statistical assumptions via stochastic simulation , 2016, Information Retrieval Journal.
[13] On the robusness of the one one sample t test , 1989 .
[14] Mónica Marrero,et al. A comparison of the optimality of statistical significance tests for information retrieval evaluation , 2013, SIGIR.
[15] David A. Hull. Using statistical testing in the evaluation of retrieval experiments , 1993, SIGIR.
[16] Alan Hanjalic,et al. Statistical Significance Testing in Information Retrieval: An Empirical Analysis of Type I, Type II and Type III Errors , 2019, SIGIR.
[17] Julián Urbano,et al. Stochastic Simulation of Test Collections: Evaluation Scores , 2018, SIGIR.
[18] Mark Sanderson,et al. Information retrieval system evaluation: effort, sensitivity, and reliability , 2005, SIGIR '05.
[19] Gordon V. Cormack,et al. Validity and power of t-test for comparing MAP and GMAP , 2007, SIGIR.
[20] James Allan,et al. Agreement among statistical significance tests for information retrieval evaluation at varying sample sizes , 2009, SIGIR.
[21] Ben Carterette,et al. But Is It Statistically Significant?: Statistical Significance in IR Research, 1995-2014 , 2017, SIGIR.
[22] Ellen M. Voorhees,et al. Topic set size redux , 2009, SIGIR.
[23] Jonathan A. Tawn,et al. Bivariate extreme value theory: Models and estimation , 1988 .
[24] S. R. Searle,et al. Population Marginal Means in the Linear Model: An Alternative to Least Squares Means , 1980 .
[25] R. Manmatha,et al. Modeling score distributions for combining the outputs of search engines , 2001, SIGIR '01.
[26] Justin Zobel,et al. How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.