论文信息 - Bolstered error estimation

Bolstered error estimation

Abstract We propose a general method for error estimation that displays low variance and generally low bias as well. This method is based on “bolstering” the original empirical distribution of the data. It has a direct geometric interpretation and can be easily applied to any classification rule and any number of classes. This method can be used to improve the performance of any error-counting estimation method, such as resubstitution and all cross-validation estimators, particularly in small-sample settings. We point out some similarities shared by our method with a previously proposed technique, known as smoothed error estimation. In some important cases, such as a linear classification rule with a Gaussian bolstering kernel, the integrals in the bolstered error estimate can be computed exactly. In the general case, the bolstered error estimate may be computed by Monte-Carlo sampling; however, our experiments show that a very small number of Monte-Carlo samples is needed. This results in a fast error estimator, which is in contrast to other resampling techniques, such as the bootstrap. We provide an extensive simulation study comparing the proposed method with resubstitution, cross-validation, and bootstrap error estimation, for three popular classification rules (linear discriminant analysis, k -nearest-neighbor, and decision trees), using several sample sizes, from small to moderate. The results indicate the proposed method vastly improves on resubstitution and cross-validation, especially for small samples, in terms of bias and variance. In that respect, it is competitive with, and in many occasions superior to, bootstrap error estimation, while being tens to hundreds of times faster. We provide a companion web site, which contains: (1) the complete set of tables and plots regarding the simulation study, and (2) C source code used to implement the bolstered error estimators proposed in this paper, as part of a larger library for classification and error estimation, with full documentation and examples. The companion web site can be accessed at the URL http://ee.tamu.edu/~edward/bolster .

Ulisses Braga-Neto | Edward R. Dougherty | E. Dougherty | U. Braga-Neto