An efficient constrained learning algorithm with momentum acceleration

An algorithm for efficient learning in feedforward networks is presented. Momentum acceleration is achieved by solving a constrained optimization problem using nonlinear programming techniques. In particular, minimization of the usual mean square error cost function is attempted under an additional condition for which the purpose is to optimize the alignment of the weight update vectors in successive epochs. The algorithm is applied to several benchmark training tasks (exclusive-or, encoder, multiplexer, and counter problems). Its performance, in terms of learning speed and scalability properties, is evaluated and found superior to the performance of reputedly fast variants of the back-propagation algorithm in the above benchmarks.

[1]  R. Fletcher Practical Methods of Optimization , 1988 .

[2]  Dimitris A. Karras,et al.  An efficient constrained learning algorithm for optimal linear separability of the internal representations , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[3]  S. J. Perantonis,et al.  A fast constrained learning algorithm based on the construction of suitable internal representations , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[4]  Douglass J. Wilde,et al.  Foundations of Optimization. , 1967 .

[5]  Richard Rohwer,et al.  The "Moving Targets" Training Algorithm , 1989, NIPS.

[6]  Jürgen Schmidhuber,et al.  Accelerated learning in back-propagation nets , 1989 .

[7]  Stavros J. Perantonis,et al.  Optical Character Recognition Using Novel Feature Extraction & Neural Network Classification Techniques , 1993, Workshop on Neural Network Applications and Tools.

[8]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[9]  Y S Abu-Mostafa,et al.  Neural networks for computing , 1987 .

[10]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[11]  A. Goldstein On Steepest Descent , 1965 .

[12]  A. E. Bryson,et al.  A Steepest-Ascent Method for Solving Optimum Programming Problems , 1962 .

[13]  Raymond L. Watrous Learning Algorithms for Connectionist Networks: Applied Gradient Methods of Nonlinear Optimization , 1988 .

[14]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[15]  Gerald Tesauro,et al.  Scaling Relationships in Back-propagation Learning , 1988, Complex Syst..

[16]  Singiresu S. Rao,et al.  Optimization Theory and Applications , 1980, IEEE Transactions on Systems, Man, and Cybernetics.

[17]  Petros G. Voulgaris,et al.  On optimal ℓ∞ to ℓ∞ filtering , 1995, Autom..

[18]  Masafumi Hagiwara,et al.  Theoretical derivation of momentum term in back-propagation , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[19]  Kurt Hornik,et al.  FEED FORWARD NETWORKS ARE UNIVERSAL APPROXIMATORS , 1989 .

[20]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[21]  Dimitris A. Karras,et al.  An efficient constrained training algorithm for feedforward networks , 1995, IEEE Trans. Neural Networks.

[22]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[23]  Sharad Singhal,et al.  Training feed-forward networks with the extended Kalman algorithm , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[24]  Tom Tollenaere,et al.  SuperSAB: Fast adaptive back propagation with good scaling properties , 1990, Neural Networks.

[25]  Yoh-Han Pao,et al.  Adaptive pattern recognition and neural networks , 1989 .

[26]  Arjen van Ooyen,et al.  Improving the convergence of the back-propagation algorithm , 1992, Neural Networks.

[27]  Anders Krogh,et al.  A Cost Function for Internal Representations , 1989, NIPS.

[28]  Tal Grossman,et al.  The CHIR Algorithm for Feed Forward Networks with Binary Weights , 1989, NIPS.

[29]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[30]  Eytan Domany,et al.  Learning by Choice of Internal Representations , 1988, Complex Syst..

[31]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[32]  Don R. Hush,et al.  Error surfaces for multilayer perceptrons , 1992, IEEE Trans. Syst. Man Cybern..

[33]  Yann LeCun,et al.  Improving the convergence of back-propagation learning with second-order methods , 1989 .

[34]  Ali A. Minai,et al.  Back-propagation heuristics: a study of the extended delta-bar-delta algorithm , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[35]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[36]  Francesco Palmieri,et al.  Optimal filtering algorithms for fast learning in feedforward neural networks , 1992, Neural Networks.

[37]  Alberto L. Sangiovanni-Vincentelli,et al.  Efficient Parallel Learning Algorithms for Neural Networks , 1988, NIPS.

[38]  Jerome A. Feldman,et al.  Connectionist Models and Their Applications: Introduction , 1985 .

[39]  Athanasios Papoulis,et al.  Probability, Random Variables and Stochastic Processes , 1965 .