Bias-Variance Analysis of Support Vector Machines for the Development of SVM-Based Ensemble Methods

Bias-variance analysis provides a tool to study learning algorithms and can be used to properly design ensemble methods well tuned to the properties of a specific base learner. Indeed the effectiveness of ensemble methods critically depends on accuracy, diversity and learning characteristics of base learners. We present an extended experimental analysis of bias-variance decomposition of the error in Support Vector Machines (SVMs), considering Gaussian, polynomial and dot product kernels. A characterization of the error decomposition is provided, by means of the analysis of the relationships between bias, variance, kernel type and its parameters, offering insights into the way SVMs learn. The results show that the expected trade-off between bias and variance is sometimes observed, but more complex relationships can be detected, especially in Gaussian and polynomial kernels. We show that the bias-variance decomposition offers a rationale to develop ensemble methods using SVMs as base learners, and we outline two directions for developing SVM ensembles, exploiting the SVM bias characteristics and the bias-variance dependence on the kernel param

[1]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[2]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[3]  Thomas G. Dietterich,et al.  Error-Correcting Output Coding Corrects Bias and Variance , 1995, ICML.

[4]  Ching Y. Suen,et al.  A Method of Combining Multiple Experts for the Recognition of Unconstrained Handwritten Numerals , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[6]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[7]  Robert Tibshirani,et al.  Bias, Variance and Prediction Error for Classification Rules , 1996 .

[8]  Leo Breiman,et al.  Bias, Variance , And Arcing Classifiers , 1996 .

[9]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[10]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[11]  William I. Gasarch,et al.  Book Review: An introduction to Kolmogorov Complexity and its Applications Second Edition, 1997 by Ming Li and Paul Vitanyi (Springer (Graduate Text Series)) , 1997, SIGACT News.

[12]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[13]  D Wang,et al.  Use of fuzzy-logic-inspired features to improve bacterial recognition through classifier fusion , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[14]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[15]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Tom Heskes,et al.  Bias/Variance Decompositions for Likelihood-Based Estimators , 1998, Neural Computation.

[17]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[18]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[19]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[20]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[21]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[22]  Thomas G. Dietterich Ensemble Methods in Machine Learning , 2000, Multiple Classifier Systems.

[23]  Pedro M. Domingos A Unified Bias-Variance Decomposition for Zero-One and Squared Loss , 2000, AAAI/IAAI.

[24]  Eugene M. Kleinberg A Mathematically Rigorous Foundation for Supervised Learning , 2000, Multiple Classifier Systems.

[25]  Pedro M. Domingos A Unifeid Bias-Variance Decomposition and its Applications , 2000, ICML.

[26]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[27]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[28]  Pedro M. Domingos A Unifeid Bias-Variance Decomposition and its Applications , 2000, ICML.

[29]  Tomaso A. Poggio,et al.  Bounds on the Generalization Performance of Kernel Machine Ensembles , 2000, ICML.

[30]  Nathan Intrator,et al.  Automatic model selection in a hybrid perceptron/radial network , 2001, Inf. Fusion.

[31]  James C. Bezdek,et al.  Decision templates for multiple classifier fusion: an experimental comparison , 2001, Pattern Recognit..

[32]  Gian Luca Marcialis,et al.  Complexity of Data Subsets Generated by the Random Subspace Method: An Experimental Investigation , 2001, Multiple Classifier Systems.

[33]  Ioannis Pitas,et al.  Combining support vector machines for accurate face detection , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[34]  Yves Grandvalet,et al.  Adaptive Scaling for Feature Selection in SVMs , 2002, NIPS.

[35]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[36]  Giorgio Valentini,et al.  NEURObjects: an object-oriented library for neural network development , 2002, Neurocomputing.

[37]  Hyun-Chul Kim,et al.  Pattern classification using support vector machine ensemble , 2002, Object recognition supported by user interaction for service robots.

[38]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  Giorgio Valentini,et al.  Low Bias Bagged Support Vector Machines , 2003, ICML.

[40]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[41]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[42]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[43]  Gareth James,et al.  Variance and Bias for General Loss Functions , 2003, Machine Learning.

[44]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[45]  Peter L. Bartlett,et al.  Improved Generalization Through Explicit Optimization of Margins , 2000, Machine Learning.

[46]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[47]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[48]  Pedro M. Domingos A Unified Bias-Variance Decomposition , 2022 .