Optimal learning in arti cial neural networks : a theoretical viewM

The eeectiveness of connectionist models in emulating intelligent behaviour and solving signiicant practical problems is strictly related to the capability of the learning algorithms to nd optimal or near-optimal solutions and to generalize to new examples. This paper reviews some theoretical contributions to optimal learning in the attempt to provide a uniied view and give the state of the art in the eld. The focus of the review is on the problem of local minima in the cost function that is likely to aaect more or less any learning algorithm. Starting from this analysis, we brieey review proposals for discovering optimal solutions and suggest conditions for designing architectures tailored to a given task.

[1]  Paolo Frasconi,et al.  Learning without local minima in radial basis function networks , 1995, IEEE Trans. Neural Networks.

[2]  Paolo Frasconi,et al.  Learning in multilayered networks used as autoassociators , 1995, IEEE Trans. Neural Networks.

[3]  Marco Gori,et al.  Does Terminal Attractor Backpropagation Guarantee Global Optimization , 1994 .

[4]  Marco Gori,et al.  On the problem of local minima in recurrent neural networks , 1994, IEEE Trans. Neural Networks.

[5]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[6]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[7]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[8]  Paolo Frasconi,et al.  Backpropagation for linearly-separable patterns: A detailed analysis , 1993, IEEE International Conference on Neural Networks.

[9]  Tsu-Shuan Chang,et al.  A universal neural net with guaranteed convergence to zero system error , 1992, IEEE Trans. Signal Process..

[10]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[11]  Xiao-Hu Yu,et al.  Can backpropagation error surface not have local minima , 1992, IEEE Trans. Neural Networks.

[12]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[13]  Sandro Ridella,et al.  Statistically controlled activation weight initialization (SCAWI) , 1992, IEEE Trans. Neural Networks.

[14]  Yoshua Bengio,et al.  Learning the dynamic nature of speech with back-propagation for sequences , 1992, Pattern Recognit. Lett..

[15]  Jürgen Schmidhuber,et al.  Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[16]  Mark Jurik,et al.  Neurocomputing: Foundations of research , 1992 .

[17]  Jianqiang Yi,et al.  Backpropagation based on the logarithmic error function and elimination of local minima , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[18]  Jinhui Chao,et al.  How to find global minima in finite times of search for multilayer perceptrons training , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[19]  Ching-Chi Hsu,et al.  Terminal attractor learning algorithms for back propagation neural networks , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[20]  F. Jordan,et al.  Using the symmetries of a multi-layered network to reduce the weight space , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[21]  D. Rumelhart,et al.  Generalization by weight-elimination applied to currency exchange rate prediction , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[22]  Jing Peng,et al.  An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[23]  Geoffrey E. Hinton Preface to the Special Issue on Connectionist Symbol Processing , 1990 .

[24]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[25]  T. Kohonen The self-organizing map , 1990, Neurocomputing.

[26]  Bernard Widrow,et al.  30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[27]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[28]  John J. Shynk,et al.  Performance surfaces of a single-layer perceptron , 1990, IEEE Trans. Neural Networks.

[29]  Ehud D. Karnin,et al.  A simple procedure for pruning back-propagation trained neural networks , 1990, IEEE Trans. Neural Networks.

[30]  Stephen I. Gallant,et al.  Perceptron-based learning algorithms , 1990, IEEE Trans. Neural Networks.

[31]  J. S. Judd Neural network design and the complexity of learning , 1990, Neural network modeling and connectionism.

[32]  Marcus Frean,et al.  The Upstart Algorithm: A Method for Constructing and Training Feedforward Neural Networks , 1990, Neural Computation.

[33]  Chuanyi Ji,et al.  Generalizing Smoothness Constraints from Discrete Samples , 1990, Neural Computation.

[34]  Michael I. Jordan,et al.  Task Decomposition Through Competition in a Modular Connectionist Architecture: The What and Where Vision Tasks , 1990, Cogn. Sci..

[35]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[36]  E. K. Blum,et al.  Approximation of Boolean Functions by Sigmoidal Networks: Part I: XOR and Other Two-Variable Functions , 1989, Neural Computation.

[37]  Eduardo D. Sontag,et al.  Backpropagation separates when perceptrons do , 1989, International 1989 Joint Conference on Neural Networks.

[38]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[39]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[40]  J. Nadal,et al.  Learning in feedforward layered networks: the tiling algorithm , 1989 .

[41]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[42]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[43]  J. Slawny,et al.  Back propagation fails to separate where perceptrons succeed , 1989 .

[44]  Alexander H. Waibel,et al.  Modular Construction of Time-Delay Neural Networks for Speech Recognition , 1989, Neural Computation.

[45]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[46]  Esther Levin,et al.  Accelerated Learning in Layered Neural Networks , 1988, Complex Syst..

[47]  H. Bourlard,et al.  Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[48]  D. R. Hush,et al.  Improving the learning rate of back-propagation with the gradient reuse algorithm , 1988, IEEE 1988 International Conference on Neural Networks.

[49]  D Zipser,et al.  Learning the hidden structure of speech. , 1988, The Journal of the Acoustical Society of America.

[50]  J. Fodor,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[51]  Marvin Minsky,et al.  Perceptrons: expanded edition , 1988 .

[52]  Geoffrey E. Hinton,et al.  Learning sets of filters using back-propagation , 1987 .

[53]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[54]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[55]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[56]  J. Meditch,et al.  Applied optimal control , 1972, IEEE Transactions on Automatic Control.

[57]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[58]  A. Hall,et al.  Adaptive Switching Circuits , 2016 .

[59]  M. Hasselmo,et al.  Gaussian Processes for Regression , 1995, NIPS.

[60]  Adrian J. Shepherd,et al.  A CLASSICAL ALGORITHM FOR AVOIDING LOCAL MINIMA , 1994 .

[61]  Ron Sun,et al.  Integrating rules and connectionism for robust commonsense reasoning , 1994, Sixth-generation computer technology series.

[62]  Chi-Ping Tsang,et al.  On the Convergence of Feed Forward Neural Networks Incorporating Terminal Attractors , 1993 .

[63]  Kenji Doya,et al.  Bifurcations of Recurrent Neural Networks in Gradient Descent Learning , 1993 .

[64]  Stephen I. Gallant,et al.  Neural network learning and expert systems , 1993 .

[65]  Alberto Tesi,et al.  On the Problem of Local Minima in Backpropagation , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[66]  Yih-Fang Huang,et al.  Bounds on the number of hidden neurons in multilayer perceptrons , 1991, IEEE Trans. Neural Networks.

[67]  A. A. Zhigli︠a︡vskiĭ,et al.  Theory of Global Random Search , 1991 .

[68]  Pietro Burrascano,et al.  A norm selection criterion for the generalized delta rule , 1991, IEEE Trans. Neural Networks.

[69]  Hervé Bourlard,et al.  Speech pattern discrimination and multilayer perceptrons , 1989 .

[70]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[71]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[72]  Robert Hecht-Nielsen,et al.  Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.

[73]  Eduardo D. Sontag,et al.  Backpropagation Can Give Rise to Spurious Local Minima Even for Networks without Hidden Layers , 1989, Complex Syst..

[74]  R. Hecht-Nielsen,et al.  Back propagation error surfaces can have local minima , 1989, International 1989 Joint Conference on Neural Networks.

[75]  Yann LeCun,et al.  Generalization and network design strategies , 1989 .

[76]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[77]  Yves Chauvin,et al.  A Back-Propagation Algorithm with Optimal Use of Hidden Units , 1988, NIPS.

[78]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[79]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[80]  James L. McClelland Explorations In Parallel Distributed Processing , 1988 .

[81]  Stephen Jose Hanson,et al.  Minkowski-r Back-Propagation: Learning in Connectionist Models with Non-Euclidian Error Signals , 1987, NIPS.

[82]  Eric B. Baum,et al.  Supervised Learning of Probability Distributions by Neural Networks , 1987, NIPS.

[83]  Yann LeCun,et al.  Learning processes in an asymmetric threshold network , 1986 .

[84]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[85]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[86]  Louis Weinberg,et al.  Automation and Remote Control , 1957 .