Stochastic Organization of Output Codes in Multiclass Learning Problems

The best-known decomposition schemes of multiclass learning problems are one per class coding (OPC) and error-correcting output coding (ECOC). Both methods perform a prior decomposition, that is, before training of the classifier takes place. The impact of output codes on the inferred decision rules can be experienced only after learning. Therefore, we present a novel algorithm for the code design of multiclass learning problems. This algorithm applies a maximum-likelihood objective function in conjunction with the expectation-maximization (EM) algorithm. Minimizing the augmented objective function yields the optimal decomposition of the multiclass learning problem in two-class problems. Experimental results show the potential gain of the optimized output codes over OPC or ECOC methods.

[1]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[2]  Kishan G. Mehrotra,et al.  Efficient classification for multiclass problems using modular neural networks , 1995, IEEE Trans. Neural Networks.

[3]  Joachim M. Buhmann,et al.  Unsupervised Texture Segmentation in a Deterministic Annealing Framework , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  S. Schäffler,et al.  Unconstrained global optimization using stochastic intergral equations , 1995 .

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  Josef A. Nossek,et al.  Classification systems based on neural networks , 1998, 1998 Fifth IEEE International Workshop on Cellular Neural Networks and their Applications. Proceedings (Cat. No.98TH8359).

[7]  Kenneth Rose,et al.  A global optimization technique for statistical classifier design , 1996, IEEE Trans. Signal Process..

[8]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[9]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[10]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[11]  K. Rose Deterministic annealing for clustering, compression, classification, regression, and related optimization problems , 1998, Proc. IEEE.

[12]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[13]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[14]  Bernhard Schölkopf,et al.  Support vector learning , 1997 .

[15]  Venkatesan Guruswami,et al.  Multiclass learning, boosting, and error-correcting codes , 1999, COLT '99.

[16]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[17]  Wolfgang Utschick,et al.  Error correcting classification based on neural networks , 1998 .

[18]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[19]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[20]  David P. Williamson,et al.  Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.

[21]  Eddy Mayoraz,et al.  On the Decomposition of Polychotomies into Dichotomies , 1997, ICML.

[22]  Geoffrey C. Fox,et al.  Vector quantization by deterministic annealing , 1992, IEEE Trans. Inf. Theory.

[23]  Robert E. Schapire,et al.  Using output codes to boost multiclass learning problems , 1997, ICML.

[24]  R. Fletcher Practical Methods of Optimization , 1988 .

[25]  Jürgen Schürmann,et al.  Pattern classification , 2008 .

[26]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[27]  Philip E. Gill,et al.  Practical optimization , 1981 .

[28]  Ulrich Kressel,et al.  PATTERN CLASSIFICATION TECHNIQUES BASED ON FUNCTION APPROXIMATION , 1997 .

[29]  Bruce W. Suter,et al.  The multilayer perceptron as an approximation to a Bayes optimal discriminant function , 1990, IEEE Trans. Neural Networks.

[30]  T. Moon The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..

[31]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[32]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[33]  Raúl Rojas,et al.  A Short Proof of the Posterior Probability Property of Classifier Neural Networks , 1996, Neural Computation.