论文信息 - A Bootstrap Test For Comparing Performance Of Programs When Data Are Censored, And Comparisons To Etzioni''s Test

A Bootstrap Test For Comparing Performance Of Programs When Data Are Censored, And Comparisons To Etzioni''s Test

Experimental trials of programs are sometimes aborted when resource bounds are exceeded. The data from these trials are called censored data. This paper discusses the inferences that can be drawn from samples that include censored data. A key component of statistical inference, the sampling distribution, is generally not known for censored samples. However, the bootstrap procedure has been applied to estimate empirically the sampling distributions of many statistics. We show how to use the bootstrap to estimate the sampling distributions of the difference of means of two censored samples, enabling many comparisons that were previously ad hoc, such as the comparison of run times of algorithms when some run times exceed a limit. The reader will see how to extend the bootstrap to other tests with censored data. We also describe a test due to Etzioni and Etzioni for the difference of two censored samples. We show that the bootstrap test is more powerful, primarily because it does not make a strong guarantee that is a feature of the Etzioni''s test.

Paul R. Cohen | John B. Kim | P. Cohen | John B. Kim

[1] Scott M. Smith,et al. Computer Intensive Methods for Testing Hypotheses: An Introduction , 1989 .

[2] Alexander Russell,et al. A critical look at experimental evaluations of EBL , 1991, Mach. Learn..

[3] Khadija Iqbal,et al. An introduction , 1996, Neurobiology of Aging.

[4] J. V. Bradley. Distribution-Free Statistical Tests , 1968 .

[5] Paul R. Cohen,et al. Empirical methods for artificial intelligence , 1995, IEEE Expert.

[6] Oren Etzioni,et al. Statistical methods for analyzing speedup learning experiments , 2004, Machine Learning.

[7] B Efron,et al. Statistical Data Analysis in the Computer Age , 1991, Science.

[8] Robert Tibshirani,et al. Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy , 1986 .