Statistical Significance of MUC-6 Results
暂无分享,去创建一个
The results of the MUC-6 evaluation must be analyzed to determine whether close scores significantly distinguish systems or whether the differences in those scores are a matter of chance. In order to do such an analysis, a method of computer intensive hypothesis testing was developed by SAIC for the MUC-3 results and has been used for distinguishing MUC scores since that time. The implementation of this method for the MUC evaluations was first described in [1] and later the concepts behind the statistical model were explained in a more understandable manner in [2]. This paper gives the results of the statistical testing for the three MUC-6 tasks where a single metric could be associated with a system's performance.
[1] Alexander Dekhtyar,et al. Information Retrieval , 2018, Lecture Notes in Computer Science.
[2] K. J. Evans,et al. Computer Intensive Methods for Testing Hypotheses: An Introduction , 1990 .
[3] The statistical significance of the MUC-4 results , 1992, MUC.
[4] Lynette Hirschman,et al. Evaluating Message Understanding Systems: An Analysis of the Third Message Understanding Conference (MUC-3) , 1993, CL.