New results on error correcting output codes of kernel machines

We study the problem of multiclass classification within the framework of error correcting output codes (ECOC) using margin-based binary classifiers. Specifically, we address two important open problems in this context: decoding and model selection. The decoding problem concerns how to map the outputs of the classifiers into class codewords. In this paper we introduce a new decoding function that combines the margins through an estimate of their class conditional probabilities. Concerning model selection, we present new theoretical results bounding the leave-one-out (LOO) error of ECOC of kernel machines, which can be used to tune kernel hyperparameters. We report experiments using support vector machines as the base binary classifiers, showing the advantage of the proposed decoding function over other functions of I he margin commonly used in practice. Moreover, our empirical evaluations on model selection indicate that the bound leads to good estimates of kernel parameters.

[1]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[2]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[3]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[4]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[5]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[6]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[7]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[8]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[9]  C. Micchelli Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[10]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[11]  Koby Crammer,et al.  On the Learnability and Design of Output Codes for Multiclass Problems , 2002, Machine Learning.

[12]  P. Burman A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods , 1989 .

[13]  Dwijendra K. Ray-Chaudhuri,et al.  Binary mixture flow with free energy lattice Boltzmann methods , 2022, arXiv.org.

[14]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[15]  Yoshua Bengio,et al.  Gradient-Based Optimization of Hyperparameters , 2000, Neural Computation.

[16]  Johannes Fürnkranz,et al.  Round Robin Classification , 2002, J. Mach. Learn. Res..

[17]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[18]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[19]  C. Micchelli,et al.  Functions that preserve families of positive semidefinite matrices , 1995 .

[20]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[21]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[22]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[23]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[25]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[26]  John Shawe-Taylor,et al.  Generalization Performance of Support Vector Machines and Other Pattern Classifiers , 1999 .

[27]  M. Pontil Leave-one-out error and stability of learning algorithms with applications , 2002 .

[28]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[29]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[30]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[31]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[32]  Hélène Paugam-Moisy,et al.  A new multi-class SVM based on a uniform convergence result , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[33]  L. Galway Spline Models for Observational Data , 1991 .

[34]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[35]  Jason D. M. Rennie,et al.  Improving Multiclass Text Classification with the Support Vector Machine , 2001 .

[36]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[37]  James T. Kwok Moderating the outputs of support vector machine classifiers , 1999, IEEE Trans. Neural Networks.

[38]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[39]  J. Franklin,et al.  The elements of statistical learning: data mining, inference and prediction , 2005 .