Optimal learning in artificial neural networks: A review of theoretical results

Abstract The effectiveness of connectionist models in emulating intelligent behaviour and solving significant practical problems is strictly related to the capability of the learning algorithms to find optimal or near-optimal solutions and to generalize to new examples. This paper reviews some theoretical contributions to optimal learning in the attempt to provide a unified view and give the state of the art in the field. The focus of the review is on the problem of local minima in the cost function that is likely to affect more or less any learning algorithm. Starting from this analysis, we briefly review proposals for discovering optimal solutions and suggest conditions for designing architectures tailored to a given task.

[1]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[2]  James A. Anderson,et al.  Neurocomputing: Foundations of Research , 1988 .

[3]  Jürgen Schmidhuber,et al.  Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[4]  J. Slawny,et al.  Back propagation fails to separate where perceptrons succeed , 1989 .

[5]  X.-W. Yu,et al.  Corrections to "On the Local Minima Free Condition of the Backpropagation Learning" , 1996, IEEE Trans. Neural Networks.

[6]  Jing Peng,et al.  An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[7]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[8]  Arthur E. Bryson,et al.  Applied Optimal Control , 1969 .

[9]  A. A. Zhigli︠a︡vskiĭ,et al.  Theory of Global Random Search , 1991 .

[10]  Alberto Tesi,et al.  On the Problem of Local Minima in Backpropagation , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Esther Levin,et al.  Accelerated Learning in Layered Neural Networks , 1988, Complex Syst..

[12]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[13]  James L. McClelland Explorations In Parallel Distributed Processing , 1988 .

[14]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[15]  Giovanni Soda,et al.  Unified Integration of Explicit Knowledge and Learning by Example in Recurrent Networks , 1995, IEEE Trans. Knowl. Data Eng..

[16]  Aimo A. Törn,et al.  Global Optimization , 1999, Science.

[17]  Yih-Fang Huang,et al.  Bounds on the number of hidden neurons in multilayer perceptrons , 1991, IEEE Trans. Neural Networks.

[18]  Yoshua Bengio,et al.  Learning the dynamic nature of speech with back-propagation for sequences , 1992, Pattern Recognit. Lett..

[19]  Françoise Fogelman-Soulié,et al.  Disordered Systems and Biological Organization , 1986, NATO ASI Series.

[20]  E. K. Blum,et al.  Approximation of Boolean Functions by Sigmoidal Networks: Part I: XOR and Other Two-Variable Functions , 1989, Neural Computation.

[21]  Piero Cosi,et al.  Phonetically-based multi-layered neural networks for vowel classification , 1990, Speech Commun..

[22]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[23]  Xiao-Hu Yu,et al.  Can backpropagation error surface not have local minima , 1992, IEEE Trans. Neural Networks.

[24]  Nils J. Nilsson,et al.  The Mathematical Foundations of Learning Machines , 1990 .

[25]  Pietro Burrascano,et al.  A norm selection criterion for the generalized delta rule , 1991, IEEE Trans. Neural Networks.

[26]  Paolo Frasconi,et al.  Learning without local minima in radial basis function networks , 1995, IEEE Trans. Neural Networks.

[27]  Myung Won Kim,et al.  The effect of initial weights on premature saturation in back-propagation learning , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[28]  D Zipser,et al.  Learning the hidden structure of speech. , 1988, The Journal of the Acoustical Society of America.

[29]  Marco Gori,et al.  On the problem of local minima in recurrent neural networks , 1994, IEEE Trans. Neural Networks.

[30]  R. Hecht-Nielsen,et al.  Back propagation error surfaces can have local minima , 1989, International 1989 Joint Conference on Neural Networks.

[31]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[32]  Marvin Minsky,et al.  Perceptrons: expanded edition , 1988 .

[33]  Zenon W. Pylyshyn,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[34]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[35]  Eduardo D. Sontag,et al.  Backpropagation Can Give Rise to Spurious Local Minima Even for Networks without Hidden Layers , 1989, Complex Syst..

[36]  Michael I. Jordan,et al.  Task Decomposition Through Competition in a Modular Connectionist Architecture: The What and Where Vision Tasks , 1990, Cogn. Sci..

[37]  Jianqiang Yi,et al.  Backpropagation based on the logarithmic error function and elimination of local minima , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[38]  Sandro Ridella,et al.  Statistically controlled activation weight initialization (SCAWI) , 1992, IEEE Trans. Neural Networks.

[39]  Y. L. Cun Learning Process in an Asymmetric Threshold Network , 1986 .

[40]  John R. Anderson,et al.  MACHINE LEARNING An Artificial Intelligence Approach , 2009 .

[41]  R. Pfeifer,et al.  Connectionism in Perspective , 1989 .

[42]  D. R. Hush,et al.  Improving the learning rate of back-propagation with the gradient reuse algorithm , 1988, IEEE 1988 International Conference on Neural Networks.

[43]  John J. Shynk,et al.  Performance surfaces of a single-layer perceptron , 1990, IEEE Trans. Neural Networks.

[44]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[45]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[46]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[47]  Geoffrey E. Hinton,et al.  Learning sets of filters using back-propagation , 1987 .

[48]  Hervé Bourlard,et al.  Speech pattern discrimination and multilayer perceptrons , 1989 .

[49]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[50]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[51]  P. Anandan,et al.  Neural network for model based recognition: Simulation results , 1988, Neural Networks.

[52]  E. R. Caianiello,et al.  Parallel Architectures and Neural Networks , 1990 .

[53]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[54]  Sontag,et al.  Backpropagation separates when perceptrons do , 1989 .

[55]  Eric B. Baum,et al.  Supervised Learning of Probability Distributions by Neural Networks , 1987, NIPS.

[56]  Bernard Widrow,et al.  30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[57]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[58]  Giovanni Soda,et al.  Local Feedback Multilayered Networks , 1992, Neural Computation.

[59]  Stephen I. Gallant,et al.  Perceptron-based learning algorithms , 1990, IEEE Trans. Neural Networks.

[60]  Paolo Frasconi,et al.  Backpropagation for linearly-separable patterns: A detailed analysis , 1993, IEEE International Conference on Neural Networks.

[61]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[62]  Hecht-Nielsen Theory of the backpropagation neural network , 1989 .

[63]  Nils J. Nilsson,et al.  Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Paolo Frasconi,et al.  Learning in multilayered networks used as autoassociators , 1995, IEEE Trans. Neural Networks.

[65]  Yann LeCun,et al.  Generalization and network design strategies , 1989 .