Optimal learning in artificial neural networks: A theoretical view

[1]  Aimo A. Törn,et al.  Global Optimization , 1999, Science.

[2]  Peter Tiño,et al.  Learning long-term dependencies in NARX recurrent neural networks , 1996, IEEE Trans. Neural Networks.

[3]  X H Yu,et al.  On the local minima free condition of backpropagation learning , 1995, IEEE Trans. Neural Networks.

[4]  Paolo Frasconi,et al.  Learning without local minima in radial basis function networks , 1995, IEEE Trans. Neural Networks.

[5]  Giovanni Soda,et al.  Unified Integration of Explicit Knowledge and Learning by Example in Recurrent Networks , 1995, IEEE Trans. Knowl. Data Eng..

[6]  Paolo Frasconi,et al.  Learning in multilayered networks used as autoassociators , 1995, IEEE Trans. Neural Networks.

[7]  Marco Gori,et al.  On the problem of local minima in recurrent neural networks , 1994, IEEE Trans. Neural Networks.

[8]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[9]  Robert Hecht-Nielsen,et al.  On the Geometry of Feedforward Neural Network Error Surfaces , 1993, Neural Computation.

[10]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[11]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[12]  Paolo Frasconi,et al.  Backpropagation for linearly-separable patterns: A detailed analysis , 1993, IEEE International Conference on Neural Networks.

[13]  Tsu-Shuan Chang,et al.  A universal neural net with guaranteed convergence to zero system error , 1992, IEEE Trans. Signal Process..

[14]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[15]  Xiao-Hu Yu,et al.  Can backpropagation error surface not have local minima , 1992, IEEE Trans. Neural Networks.

[16]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[17]  Sandro Ridella,et al.  Statistically controlled activation weight initialization (SCAWI) , 1992, IEEE Trans. Neural Networks.

[18]  Yoshua Bengio,et al.  Learning the dynamic nature of speech with back-propagation for sequences , 1992, Pattern Recognit. Lett..

[19]  Jürgen Schmidhuber,et al.  Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[20]  Giovanni Soda,et al.  Local Feedback Multilayered Networks , 1992, Neural Computation.

[21]  Myung Won Kim,et al.  The effect of initial weights on premature saturation in back-propagation learning , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[22]  D. Rumelhart,et al.  Generalization by weight-elimination applied to currency exchange rate prediction , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[23]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[24]  Jing Peng,et al.  An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[25]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[26]  T. Kohonen The self-organizing map , 1990, Neurocomputing.

[27]  Bernard Widrow,et al.  30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[28]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[29]  John J. Shynk,et al.  Performance surfaces of a single-layer perceptron , 1990, IEEE Trans. Neural Networks.

[30]  Ehud D. Karnin,et al.  A simple procedure for pruning back-propagation trained neural networks , 1990, IEEE Trans. Neural Networks.

[31]  Stephen I. Gallant,et al.  Perceptron-based learning algorithms , 1990, IEEE Trans. Neural Networks.

[32]  Marcus Frean,et al.  The Upstart Algorithm: A Method for Constructing and Training Feedforward Neural Networks , 1990, Neural Computation.

[33]  Chuanyi Ji,et al.  Generalizing Smoothness Constraints from Discrete Samples , 1990, Neural Computation.

[34]  Michael I. Jordan,et al.  Task Decomposition Through Competition in a Modular Connectionist Architecture: The What and Where Vision Tasks , 1990, Cogn. Sci..

[35]  Piero Cosi,et al.  Phonetically-based multi-layered neural networks for vowel classification , 1990, Speech Commun..

[36]  E. K. Blum,et al.  Approximation of Boolean Functions by Sigmoidal Networks: Part I: XOR and Other Two-Variable Functions , 1989, Neural Computation.

[37]  Eduardo D. Sontag,et al.  Backpropagation separates when perceptrons do , 1989, International 1989 Joint Conference on Neural Networks.

[38]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[39]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[40]  J. Nadal,et al.  Learning in feedforward layered networks: the tiling algorithm , 1989 .

[41]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[42]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[43]  J. Slawny,et al.  Back propagation fails to separate where perceptrons succeed , 1989 .

[44]  Alexander H. Waibel,et al.  Modular Construction of Time-Delay Neural Networks for Speech Recognition , 1989, Neural Computation.

[45]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[46]  Esther Levin,et al.  Accelerated Learning in Layered Neural Networks , 1988, Complex Syst..

[47]  D. R. Hush,et al.  Improving the learning rate of back-propagation with the gradient reuse algorithm , 1988, IEEE 1988 International Conference on Neural Networks.

[48]  D Zipser,et al.  Learning the hidden structure of speech. , 1988, The Journal of the Acoustical Society of America.

[49]  J. Fodor,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[50]  Geoffrey E. Hinton,et al.  Learning sets of filters using back-propagation , 1987 .

[51]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[52]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[53]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[54]  Marco Gori,et al.  Optimal convergence of on-line backpropagation , 1996, IEEE Trans. Neural Networks.

[55]  Alberto Tesi,et al.  On the Problem of Local Minima in Backpropagation , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[56]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[57]  David E. Rumelhart,et al.  BACK-PROPAGATION, WEIGHT-ELIMINATION AND TIME SERIES PREDICTION , 1991 .

[58]  Yih-Fang Huang,et al.  Bounds on the number of hidden neurons in multilayer perceptrons , 1991, IEEE Trans. Neural Networks.

[59]  Pietro Burrascano,et al.  A norm selection criterion for the generalized delta rule , 1991, IEEE Trans. Neural Networks.

[60]  Hervé Bourlard,et al.  Speech pattern discrimination and multilayer perceptrons , 1989 .

[61]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[62]  Eduardo D. Sontag,et al.  Backpropagation Can Give Rise to Spurious Local Minima Even for Networks without Hidden Layers , 1989, Complex Syst..

[63]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[64]  R. Hecht-Nielsen,et al.  Back propagation error surfaces can have local minima , 1989, International 1989 Joint Conference on Neural Networks.

[65]  Yann LeCun,et al.  Generalization and network design strategies , 1989 .

[66]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[67]  Robert Hecht-Nielsen,et al.  Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.

[68]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[69]  Yves Chauvin,et al.  A Back-Propagation Algorithm with Optimal Use of Hidden Units , 1988, NIPS.

[70]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[71]  Eric B. Baum,et al.  Supervised Learning of Probability Distributions by Neural Networks , 1987, NIPS.

[72]  Y. L. Cun Learning Process in an Asymmetric Threshold Network , 1986 .

[73]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[74]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .