论文信息 - Smoothed Bootstrap and Statistical Data Cloning for Classifier Evaluation

Smoothed Bootstrap and Statistical Data Cloning for Classifier Evaluation

This paper is concerned with the estimation of a classifier’s accuracy. We present a number of novel bootstrap estimators, based on kernel smoothing, that consistently show superior performance on both synthetic and real data, with respect to other established methods. We call the process of (re)sampling the data via kernel-based smoothed bootstrap data cloning. The new cloning methods outperform cross-validation and the .632+ bootstrap, which, according to Efron and Tibshirani, is the estimator of choice. Finally, we extend our estimators to complex real-life data sets, in which a data point might include real, bounded, integer and nominal attributes, thus allowing for better classifier evaluation over limited real data repositories such as the UCI repository.

Ran El-Yaniv | Gregory Shakhnarovich | Yoram Baram

[1] G. McLachlan. Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[2] G. A. Young,et al. The bootstrap: To smooth or not to smooth? , 1987 .

[3] D. W. Scott,et al. Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[4] P. J. Green,et al. Density Estimation for Statistics and Data Analysis , 1987 .

[5] Charles Elkan,et al. Estimating the Accuracy of Learned Concepts , 1993, IJCAI.

[6] Thorsten Joachims,et al. Estimating the Generalization Performance of an SVM Efficiently , 2000, ICML.

[7] Shigeo Abe DrEng. Pattern Classification , 2001, Springer London.

[8] Alan M. Polansky,et al. Smoothed bootstrap confidence intervals with discrete data , 1997 .

[9] Dana Ron,et al. Algorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation , 1997, Neural Computation.

[10] Thomas J. DiCiccio,et al. On Smoothing and the Bootstrap , 1989 .

[11] R. Tibshirani,et al. Improvements on Cross-Validation: The 632+ Bootstrap Method , 1997 .