Bagging for linear classifiers

Abstract Classifiers built on small training sets are usually biased or unstable. Different techniques exist to construct more stable classifiers. It is not clear which ones are good, and whether they really stabilize the classifier or just improve the performance. In this paper bagging (bootstrapping and aggregating) [L. Breiman, Bagging predictors, Machine Learning J . 24 (2), 123–140 (1996)] is studied for a number of linear classifiers. A measure for the instability of classifiers is introduced. The influence of regularization and bagging on this instability and the generalization error of linear classifiers is investigated. In a simulation study it is shown that in general bagging is not a stabilizing technique. It is also demonstrated that one can consider the instability of the classifier to predict how useful bagging will be. Finally, it is shown experimentally that bagging might improve the performance of the classifier only for very unstable situations.

[1]  Robert P. W. Duin,et al.  Stabilizing classifiers for very small sample sizes , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[2]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  K JainAnil,et al.  Small Sample Size Effects in Statistical Pattern Recognition , 1991 .

[4]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[6]  R. Fisher THE PRECISION OF DISCRIMINANT FUNCTIONS , 1940 .

[7]  Volker Tresp,et al.  Averaging Regularized Estimators , 1997, Neural Computation.

[8]  John Van Ness,et al.  The Use of Shrinkage Estimators in Linear Discriminant Analysis , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Robert Tibshirani,et al.  The out-of-bootstrap method for model averaging and selection , 1997 .

[10]  R. Duin Small sample size generalization , 1995 .

[11]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[12]  Sarunas Raudys,et al.  On Dimensionality, Sample Size, Classification Error, and Complexity of Classification Algorithm in Pattern Recognition , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Robert Tibshirani,et al.  Bias, Variance and Prediction Error for Classification Rules , 1996 .

[14]  Anil K. Jain,et al.  39 Dimensionality and sample size considerations in pattern recognition practice , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[15]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[16]  J. Friedman Regularized Discriminant Analysis , 1989 .

[17]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[18]  Robert P. W. Duin,et al.  On the accuracy of statistical pattern recognizers , 1978 .

[19]  Hans Vrolijk,et al.  Automation of fluorescent dot counting in cell nuclei , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[20]  Andrew R. Webb,et al.  Statistical Pattern Recognition , 1999 .

[21]  Wei-Liem Loh On linear discriminant analysis with adaptive ridge classification rules , 1995 .