An experimental bias-variance analysis of SVM ensembles based on resampling techniques

Recently, bias-variance decomposition of error has been used as a tool to study the behavior of learning algorithms and to develop new ensemble methods well suited to the bias-variance characteristics of base learners. We propose methods and procedures, based on Domingo's unified bias-variance theory, to evaluate and quantitatively measure the bias-variance decomposition of error in ensembles of learning machines. We apply these methods to study and compare the bias-variance characteristics of single support vector machines (SVMs) and ensembles of SVMs based on resampling techniques, and their relationships with the cardinality of the training samples. In particular, we present an experimental bias-variance analysis of bagged and random aggregated ensembles of SVMs in order to verify their theoretical variance reduction properties. The experimental bias-variance analysis quantitatively characterizes the relationships between bagging and random aggregating, and explains the reasons why ensembles built on small subsamples of the data work with large databases. Our analysis also suggests new directions for research to improve on classical bagging.

[1]  Lawrence O. Hall,et al.  A New Ensemble Diversity Measure Applied to Thinning Ensembles , 2003, Multiple Classifier Systems.

[2]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[3]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[4]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[5]  Nitesh V. Chawla,et al.  Learning Ensembles from Bites: A Scalable and Accurate Approach , 2004, J. Mach. Learn. Res..

[6]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[7]  Tomaso A. Poggio,et al.  Bounds on the Generalization Performance of Kernel Machine Ensembles , 2000, ICML.

[8]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[9]  Xin Yao,et al.  Simultaneous training of negatively correlated neural networks in an ensemble , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[10]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[11]  Giorgio Valentini,et al.  Bias-Variance Analysis of Support Vector Machines for the Development of SVM-Based Ensemble Methods , 2004, J. Mach. Learn. Res..

[12]  Fabio Roli,et al.  Methods for Designing Multiple Classifier Systems , 2001, Multiple Classifier Systems.

[13]  Leo Breiman,et al.  Pasting Small Votes for Classification in Large Databases and On-Line , 1999, Machine Learning.

[14]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[15]  Nitesh V. Chawla,et al.  Distributed Pasting of Small Votes , 2002, Multiple Classifier Systems.

[16]  L. Breiman Arcing Classifiers , 1998 .

[17]  Samy Bengio,et al.  A Parallel Mixture of SVMs for Very Large Scale Problems , 2001, Neural Computation.

[18]  Thomas G. Dietterich,et al.  Error-Correcting Output Coding Corrects Bias and Variance , 1995, ICML.

[19]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[20]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[21]  Eugene M. Kleinberg,et al.  On the Algorithmic Implementation of Stochastic Discrimination , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Robert Tibshirani,et al.  Bias, Variance and Prediction Error for Classification Rules , 1996 .

[23]  Pedro M. Domingos A Unifeid Bias-Variance Decomposition and its Applications , 2000, ICML.

[24]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Pedro M. Domingos A Unified Bias-Variance Decomposition for Zero-One and Squared Loss , 2000, AAAI/IAAI.

[26]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[27]  Giorgio Valentini,et al.  Bias-Variance Analysis and Ensembles of SVM , 2002, Multiple Classifier Systems.

[28]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[29]  Giorgio Valentini,et al.  Low Bias Bagged Support Vector Machines , 2003, ICML.

[30]  Fabio Roli,et al.  Design of effective neural network ensembles for image classification purposes , 2001, Image Vis. Comput..

[31]  Pedro M. Domingos A Unifeid Bias-Variance Decomposition and its Applications , 2000, ICML.

[32]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[33]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[34]  Leo Breiman,et al.  Bias, Variance , And Arcing Classifiers , 1996 .

[35]  Nitesh V. Chawla,et al.  Bagging Is a Small-Data-Set Phenomenon , 2001, CVPR.

[36]  Gareth James,et al.  Variance and Bias for General Loss Functions , 2003, Machine Learning.

[37]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[39]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[40]  Kagan Tumer,et al.  Error Correlation and Error Reduction in Ensemble Classifiers , 1996, Connect. Sci..

[41]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[42]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[43]  J. Friedman,et al.  On bagging and nonlinear estimation , 2007 .

[44]  Hyun-Chul Kim,et al.  Pattern classification using support vector machine ensemble , 2002, Object recognition supported by user interaction for service robots.

[45]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[46]  Giorgio Valentini,et al.  NEURObjects: an object-oriented library for neural network development , 2002, Neurocomputing.

[47]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[48]  Giorgio Valentini,et al.  Bagged ensembles of Support Vector Machines for gene expression data analysis , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[49]  Tom Heskes,et al.  Bias/Variance Decompositions for Likelihood-Based Estimators , 1998, Neural Computation.