Efficient Adaptive Learning for Classification Tasks with Binary Units

This article presents a new incremental learning algorithm for classification tasks, called Net Lines, which is well adapted for both binary and real-valued input patterns. It generates small, compact feedforward neural networks with one hidden layer of binary units and binary output units. A convergence theorem ensures that solutions with a finite number of hidden units exist for both binary and real-valued input patterns. An implementation for problems with more than two classes, valid for any binary classifier, is proposed. The generalization error and the size of the resulting networks are compared to the best published results on well-known classification benchmarks. Early stopping is shown to decrease overfitting, without improving the generalization performance.

[1]  Bruno Raffin,et al.  Learning and Generalization with Minimerror, A Temperature-Dependent Learning Algorithm , 1995, Neural Computation.

[2]  David Williams The Convergence Theorem , 1991 .

[3]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[4]  Jean-Pierre Nadal,et al.  Study of a Growth Algorithm for a Feedforward Network , 1989, Int. J. Neural Syst..

[5]  M. B. Gordon,et al.  Learning with a Temperature-Dependent Algorithm , 1995 .

[6]  Bernd Fritzke Supervised Learning with Growing Cell Structures , 1993, NIPS.

[7]  Marcus Frean,et al.  The Upstart Algorithm: A Method for Constructing and Training Feedforward Neural Networks , 1990, Neural Computation.

[8]  Sebastian Thrun,et al.  The MONK''s Problems-A Performance Comparison of Different Learning Algorithms, CMU-CS-91-197, Sch , 1991 .

[9]  M. Golea,et al.  A Convergence Theorem for Sequential Learning in Two-Layer Perceptrons , 1990 .

[10]  Jan Depenau,et al.  Automated design of neural network architecture for classification , 1995, DAIMI PB.

[11]  M. B. Gordon,et al.  Learning algorithms for perceptrons from statistical physics , 1993 .

[12]  M. B. Gordon A convergence theorem for incremental learning with real-valued inputs , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[13]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[14]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[15]  Mirta B. Gordon,et al.  Minimerror: a perceptron learning rule that finds the optimal weights , 1993, The European Symposium on Artificial Neural Networks.

[16]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Léon Bottou,et al.  Local Learning Algorithms , 1992, Neural Computation.

[18]  Somnath Mukhopadhyay,et al.  A Polynomial Time Algorithm for Generating Neural Networks for Pattern Classification: Its Stability Properties and Some Test Results , 1993, Neural Computation.

[19]  Lawrence D. Jackel,et al.  Large Automatic Learning, Rule Extraction, and Generalization , 1987, Complex Syst..

[20]  D. Martinez,et al.  The Offset Algorithm: Building and Learning Method for Multilayer Neural Networks , 1992 .

[21]  Juan-Manuel Torres-Moreno,et al.  An evolutive architecture coupled with optimal perceptron learning for classification , 1995, ESANN.

[22]  Somnath Mukhopadhyay,et al.  A polynomial time algorithm for the construction and training of a class of multilayer perceptrons , 1993, Neural Networks.

[23]  Terrence J. Sejnowski,et al.  Analysis of hidden units in a layered network trained to classify sonar targets , 1988, Neural Networks.

[24]  Lutz Prechelt,et al.  PROBEN 1 - a set of benchmarks and benchmarking rules for neural network training algorithms , 1994 .

[25]  Gérard Dreyfus,et al.  Single-layer learning revisited: a stepwise procedure for building and training a neural network , 1989, NATO Neurocomputing.

[26]  Markus Höhfeld,et al.  Learning with limited numerical precision using the cascade-correlation algorithm , 1992, IEEE Trans. Neural Networks.

[27]  Harris Drucker,et al.  Improving Performance in Neural Networks Using a Boosting Algorithm , 1992, NIPS.

[28]  Opper,et al.  Tilinglike learning in the parity machine. , 1991, Physical review. A, Atomic, molecular, and optical physics.

[29]  Richard J. Mammone,et al.  Speaker Recognition Using Neural Tree Networks , 1993, NIPS.

[30]  Vincenzo Piuri,et al.  Function approximation-fast-convergence neural approach based on spectral analysis , 1999, IEEE Trans. Neural Networks.

[31]  Marcus R. Frean,et al.  A "Thermal" Perceptron Learning Rule , 1992, Neural Computation.

[32]  Jean-Pierre Nadal,et al.  Neural trees: a new tool for classification , 1990 .

[33]  Sadaoki Furui,et al.  Speaker recognition , 1997, Scholarpedia.

[34]  J. Nadal,et al.  Learning in feedforward layered networks: the tiling algorithm , 1989 .

[35]  Brijesh Verma,et al.  A new training algorithm for feedforward neural networks , 1995, ESANN.

[36]  A. I. Ethem Alpaydin Neural models of incremental supervised and unsupervised learning , 1990 .

[37]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[38]  Padhraic Smyth,et al.  Rule-Based Neural Networks for Classification and Probability Estimation , 1992, Neural Computation.

[39]  G. Grammin Polynomial-time Algorithm , 1984 .

[40]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.