论文信息 - Efficient BackProp - 字舞流文

Efficient BackProp

Klaus-Robert Müller | Yann LeCun | Léon Bottou | Genevieve B. Orr | L. Bottou | Yann LeCun | G. Orr | K. Müller

[1] Shun-ichi Amari,et al. Complexity Issues in Natural Gradient Descent Method for Training Multilayer Perceptrons , 1998, Neural Computation.

[2] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[3] Shun-ichi Amari,et al. The Efficiency and the Robustness of Natural Gradient Descent Learning Rule , 1997, NIPS.

[4] Genevieve B. Orr,et al. Removing Noise in On-Line Search using Adaptive Batch Sizes , 1996, NIPS.

[5] Shun-ichi Amari,et al. Neural Learning in Structured Parameter Spaces - Natural Riemannian Gradient , 1996, NIPS.

[6] Andreas Ziehe,et al. Adaptive On-line Learning in Changing Environments , 1996, NIPS.

[7] Saad,et al. Exact solution for on-line learning in multilayer neural networks. , 1995, Physical review letters.

[8] Mark J. L. Orr,et al. Regularization in the Selection of Radial Basis Function Centers , 1995, Neural Computation.

[9] W. Wiegerinck,et al. Stochastic dynamics of learning with momentum in neural networks , 1994 .

[10] Wray L. Buntine,et al. Computing second derivatives in feed-forward networks: a review , 1994, IEEE Trans. Neural Networks.

[11] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[12] J. G. Taylor,et al. Mathematical Approaches to Neural Networks , 1993 .

[13] Martin Fodslette Møller,et al. Supervised Learning On Large Redundant Training Sets , 1993, Int. J. Neural Syst..

[14] Barak A. Pearlmutter,et al. Automatic Learning Rate Maximization by On-Line Estimation of the Hessian's Eigenvectors , 1992, NIPS 1992.

[15] Richard S. Sutton,et al. Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.

[16] Roberto Battiti,et al. First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[17] Elie Bienenstock,et al. Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[18] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[19] M. F. Møller. A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1990 .

[20] John E. Moody,et al. Note on Learning Rate Schedules for Stochastic Optimization , 1990, NIPS.

[21] John Moody,et al. Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[22] Geoffrey E. Hinton,et al. Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[23] R. Fletcher. Practical Methods of Optimization , 1988 .

[24] R. Jacobs. Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[25] Yann LeCun. PhD thesis: Modeles connexionnistes de l'apprentissage (connectionist learning models) , 1987 .

[26] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[27] Vladimir Cherkassky,et al. Statistical learning theory , 1998 .

[28] Christopher M. Bishop,et al. Neural networks for pattern recognition , 1995 .

[29] Haim Sompolinsky,et al. On-line Learning of Dichotomies: Algorithms and Learning Curves. , 1995, NIPS 1995.

[30] Patrick van der Smagt. Minimisation methods for training feedforward neural networks , 1994, Neural Networks.

[31] Hilbert J. Kappen,et al. On-line learning processes in artificial neural networks , 1993 .

[32] Yann LeCun,et al. Second Order Properties of Error Surfaces , 1990, NIPS.

[33] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.

[34] Yann LeCun,et al. Generalization and network design strategies , 1989 .

[35] Alberto L. Sangiovanni-Vincentelli,et al. Efficient Parallel Learning Algorithms for Neural Networks , 1988, NIPS.

[36] G. Golub. Matrix computations , 1983 .