A general method for statistical performance evaluation

In the paper, we propose a general method for statistical performance evaluation. The method incorporates various statistical metrics and automatically selects an appropriate statistical metric according to the problem parameters. Empirically, we compare the performance of five representative statistical metrics under different conditions through simulation. They are expected loss, Friedman statistic, interval-based selection, probability of win, and probably approximately correct. In the experiments, expected loss is the best for small means, like 1 or 2, and probably approximately correct is the best for all the other cases. Also, we apply the general method to compare the performance of HITS-based algorithms that combine four relevance-scoring methods, VSM, Okapi, TLS, and CDR, using a set of broad topic queries. Among the four relevance-scoring methods, CDR is the best statistically when it is combined with a HITS-based algorithm.

[1]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.

[2]  Charles L. A. Clarke,et al.  Relevance ranking for one to three term queries , 1997, Inf. Process. Manag..

[3]  Gerald DeJong,et al.  COMPOSER: A Probabilistic Solution to the Utility Problem in Speed-Up Learning , 1992, AAAI.

[4]  R. Bechhofer A Single-Sample Multiple Decision Procedure for Ranking Means of Normal Populations with known Variances , 1954 .

[5]  Jaideep Srivastava,et al.  First 20 precision among World Wide Web search services (search engines) , 1999 .

[6]  Benjamin W. Wah,et al.  Statistical Generalization of Performance-Related Heuristics for Knowledge-Lean Applications , 1996, Int. J. Artif. Intell. Tools.

[7]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[8]  Henk Sol,et al.  Proceedings of the 54th Hawaii International Conference on System Sciences , 1997, HICSS 2015.

[9]  Jonathan Gratch,et al.  On the Efficient Allocation of Resources for Hypothesis Evaluation: A Statistical Approach , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Steven G. Louie,et al.  A Monte carlo simulated annealing approach to optimization over continuous variables , 1984 .

[11]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[12]  Peter Bailey,et al.  ACSys TREC-8 Experiments , 1999, TREC.

[13]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[14]  Benjamin Belzer,et al.  Wavelet filter evaluation for image compression , 1995, IEEE Trans. Image Process..

[15]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[16]  Steve A. Chien,et al.  Efficient Heuristic Hypothesis Ranking , 1999, J. Artif. Intell. Res..

[17]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[18]  David Hawking,et al.  Overview of TREC-7 Very Large Collection Track , 1997, TREC.