Efficient autonomous learning for statistical pattern recognition

We describe a neural network learning algorithm that implements differential learning in a generalized backpropagation framework. The algorithm regulates model complexity during the learning procedure, generating the best low-complexity approximation for the Bayes-optimal classifier allowed by the training sample. It learns to recognize handwritten digits of the AT&T DB1 database. Learning is done with little human intervention. The algorithm generates a simple neural network classifier from the benchmark partitioning of the database; the classifier has 650 total parameters and exhibits a test sample error rate of 1.3%.

[1]  Lorien Y. Pratt,et al.  Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.

[2]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[3]  Bhagavatula Vijaya Kumar,et al.  Differential theory of learning for efficient neural network pattern recognition , 1993, Defense, Security, and Sensing.

[4]  Naoki Inagaki,et al.  Accelerated stochastic approximation method based parameter estimation of monosyllables and their recognition using a neural network , 1992, ICSLP.

[5]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[6]  John Benjamin A differential theory of learning for efficient statistical pattern recognition , 1993 .

[7]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[8]  H. Kesten Accelerated Stochastic Approximation , 1958 .

[9]  Richard S. Sutton,et al.  Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.

[10]  Waibel A novel objective function for improved phoneme recognition using time delay neural networks , 1989 .

[11]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[12]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[13]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[14]  B. V. Vijaya Kumar,et al.  Differentially generated neural network classifiers are efficient , 1993, Neural Networks for Signal Processing III - Proceedings of the 1993 IEEE-SP Workshop.

[15]  Isabelle Guyon,et al.  Structural Risk Minimization for Character Recognition , 1991, NIPS.

[16]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.