论文信息 - Differential learning leads to efficient neural network classifiers

Differential learning leads to efficient neural network classifiers

The authors outline a differential theory of learning for statistical pattern classification. The theory is based on classification figure-of-merit (CFM) objective functions, described by J. P. Hampshire II and A. H. Waibel (IEEE Trans. Neural Netw. vol.1, no.2, p.216-218, June 1990). They give the proof that differential learning is efficient, requiring the least classifier complexity and the smallest training sample size necessary to achieve Bayesian (i.e., minimum error) discrimination. A practical application of the theory is included in which a simple differentially trained linear neural network classifier discriminations handwritten digits of the AT&T DB1 database with a 1.3% error rate. This error rate is less than one half of the best previous result for a linear classifier on this optical character recognition (OCR) task.<<ETX>>

B. V. K. Vijaya Kumar | Josef Bigün | John B. Hampshire

[1] Solomon Kullback,et al. Information Theory and Statistics , 1960 .

[2] Etienne Barnard,et al. A comparison between criterion functions for linear classifiers, with an application to neural nets , 1989, IEEE Trans. Syst. Man Cybern..

[3] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.

[4] H. Gish. A minimum classification error, maximum likelihood, neural network , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5] B.V.K. Vijaya Kumar,et al. Why error measures are sub-optimal for training neural network pattern classifiers , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[6] Barak A. Pearlmutter,et al. Equivalence Proofs for Multi-Layer Perceptron Classifiers and the Bayesian Discriminant Function , 1991 .

[7] Biing-Hwang Juang,et al. Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[8] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[9] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[10] Elie Bienenstock,et al. Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[11] A. Kolmogorov. Three approaches to the quantitative definition of information , 1968 .

[12] Isabelle Guyon,et al. Structural Risk Minimization for Character Recognition , 1991, NIPS.

[13] Vladimir Vapnik,et al. Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[14] Amro El-Jaroudi,et al. A new error criterion for posterior probability estimation with neural nets , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[15] B. Natarajan. Machine Learning: A Theoretical Approach , 1992 .

[16] Geoffrey E. Hinton. 20 – CONNECTIONIST LEARNING PROCEDURES1 , 1990 .

[17] Alexander H. Waibel,et al. A novel objective function for improved phoneme recognition using time delay neural networks , 1990, International 1989 Joint Conference on Neural Networks.

[18] B. V. K. Vijaya Kumar,et al. Shooting Craps in Search of an Optimal Strategy for Training Connectionist Pattern Classifiers , 1991, NIPS.