Efficient BackProp

[1]  Shun-ichi Amari,et al.  Complexity Issues in Natural Gradient Descent Method for Training Multilayer Perceptrons , 1998, Neural Computation.

[2]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[3]  Shun-ichi Amari,et al.  The Efficiency and the Robustness of Natural Gradient Descent Learning Rule , 1997, NIPS.

[4]  Genevieve B. Orr,et al.  Removing Noise in On-Line Search using Adaptive Batch Sizes , 1996, NIPS.

[5]  Shun-ichi Amari,et al.  Neural Learning in Structured Parameter Spaces - Natural Riemannian Gradient , 1996, NIPS.

[6]  Andreas Ziehe,et al.  Adaptive On-line Learning in Changing Environments , 1996, NIPS.

[7]  Saad,et al.  Exact solution for on-line learning in multilayer neural networks. , 1995, Physical review letters.

[8]  Mark J. L. Orr,et al.  Regularization in the Selection of Radial Basis Function Centers , 1995, Neural Computation.

[9]  W. Wiegerinck,et al.  Stochastic dynamics of learning with momentum in neural networks , 1994 .

[10]  Wray L. Buntine,et al.  Computing second derivatives in feed-forward networks: a review , 1994, IEEE Trans. Neural Networks.

[11]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[12]  J. G. Taylor,et al.  Mathematical Approaches to Neural Networks , 1993 .

[13]  Martin Fodslette Møller,et al.  Supervised Learning On Large Redundant Training Sets , 1993, Int. J. Neural Syst..

[14]  Barak A. Pearlmutter,et al.  Automatic Learning Rate Maximization by On-Line Estimation of the Hessian's Eigenvectors , 1992, NIPS 1992.

[15]  Richard S. Sutton,et al.  Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.

[16]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[17]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[18]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[19]  M. F. Møller A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1990 .

[20]  John E. Moody,et al.  Note on Learning Rate Schedules for Stochastic Optimization , 1990, NIPS.

[21]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[22]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[23]  R. Fletcher Practical Methods of Optimization , 1988 .

[24]  R. Jacobs Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[25]  Yann LeCun PhD thesis: Modeles connexionnistes de l'apprentissage (connectionist learning models) , 1987 .

[26]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[27]  Vladimir Cherkassky,et al.  Statistical learning theory , 1998 .

[28]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[29]  Haim Sompolinsky,et al.  On-line Learning of Dichotomies: Algorithms and Learning Curves. , 1995, NIPS 1995.

[30]  Patrick van der Smagt Minimisation methods for training feedforward neural networks , 1994, Neural Networks.

[31]  Hilbert J. Kappen,et al.  On-line learning processes in artificial neural networks , 1993 .

[32]  Yann LeCun,et al.  Second Order Properties of Error Surfaces , 1990, NIPS.

[33]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[34]  Yann LeCun,et al.  Generalization and network design strategies , 1989 .

[35]  Alberto L. Sangiovanni-Vincentelli,et al.  Efficient Parallel Learning Algorithms for Neural Networks , 1988, NIPS.

[36]  G. Golub Matrix computations , 1983 .