Solving Multiclass Learning Problems via Error-Correcting Output Codes

Multiclass learning problems involve finding a definition for an unknown function f(x) whose range is a discrete set containing k > 2 values (i.e., k "classes"). The definition is acquired by studying collections of training examples of the form (xi, f(xi)). Existing approaches to multiclass learning problems include direct application of multiclass algorithms such as the decision-tree algorithms C4.5 and CART, application of binary concept learning algorithms to learn individual binary functions for each of the k classes, and application of binary concept learning algorithms with distributed output representations. This paper compares these three approaches to a new technique in which error-correcting codes are employed as a distributed output representation. We show that these output representations improve the generalization performance of both C4.5 and backpropagation on a wide range of multiclass learning tasks. We also demonstrate that this approach is robust with respect to changes in the size of the training sample, the assignment of distributed representations to particular classes, and the application of overfitting avoidance techniques such as decision-tree pruning. Finally, we show that--like the other methods--the error-correcting code technique can provide reliable class probability estimates. Taken together, these results demonstrate that error-correcting output codes provide a general-purpose method for improving the performance of inductive learning programs on multiclass problems.

[1]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[2]  Dwijendra K. Ray-Chaudhuri,et al.  Binary mixture flow with free energy lattice Boltzmann methods , 2022, arXiv.org.

[3]  W. W. Peterson,et al.  Error-Correcting Codes. , 1962 .

[4]  J. W. Machanik,et al.  FUNCTION MODELING EXPERIMENTS. , 1963 .

[5]  G. W. Snedecor Statistical Methods , 1964 .

[6]  Statistical methods , 1980 .

[7]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[8]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[9]  English Text,et al.  Parallel Networks that Learn to Pronounce , 1987 .

[10]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[11]  Waibel A novel objective function for improved phoneme recognition using time delay neural networks , 1989 .

[12]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[13]  Ronald A. Cole,et al.  A neural-net training program based on conjugate-radient optimization , 1989 .

[14]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[15]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[16]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[17]  Kevin J. Lang A time delay neural network architecture for speech recognition , 1989 .

[18]  Geoffrey E. Hinton,et al.  A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.

[19]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[20]  Thomas G. Dietterich,et al.  Converting English text to speech: a machine learning approach , 1991 .

[21]  Balas K. Natarajan,et al.  Machine Learning: A Theoretical Approach , 1992 .

[22]  Hector J. Levesque,et al.  A New Method for Solving Hard Satisfiability Problems , 1992, AAAI.

[23]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[24]  Yoav Freund,et al.  An improved boosting algorithm and its implications on learning complexity , 1992, COLT '92.

[25]  Patrick J. Grother,et al.  The First Census Optical Character Recognition Systems Conference | NIST , 1992 .

[26]  Claire Cardie,et al.  Using Decision Trees to Improve Case-Based Learning , 1993, ICML.

[27]  R. Tibshirani,et al.  Flexible Discriminant Analysis by Optimal Scoring , 1994 .

[28]  Thomas G. Dietterich,et al.  Why Error Correcting Output Coding Works , 1994 .