论文信息 - The general inefficiency of batch training for gradient descent learning - 字舞流文

The general inefficiency of batch training for gradient descent learning

Tony R. Martinez | D. Randall Wilson | T. Martinez | D. Wilson | D.R. Wilson | D. Wilson

[1] Philipp Slusallek,et al. Introduction to real-time ray tracing , 2005, SIGGRAPH Courses.

[2] Teuvo Kohonen,et al. Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[3] Tony R. Martinez,et al. The need for small learning rates on large problems , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[4] Ali Zilouchian,et al. FUNDAMENTALS OF NEURAL NETWORKS , 2001 .

[5] Amir F. Atiya,et al. New results on recurrent network training: unifying the algorithms and accelerating convergence , 2000, IEEE Trans. Neural Networks Learn. Syst..

[6] Jose C. Principe,et al. Neural and adaptive systems , 2000 .

[7] J. Nazuno. Haykin, Simon. Neural networks: A comprehensive foundation, Prentice Hall, Inc. Segunda Edición, 1999 , 2000 .

[8] J. Spall. Stochastic Optimization, Stochastic Approximation and Simulated Annealing , 1999 .

[9] Enrico Gobbetti,et al. Encyclopedia of Electrical and Electronics Engineering , 1999 .

[10] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[11] M.H. Hassoun,et al. Fundamentals of Artificial Neural Networks , 1996, Proceedings of the IEEE.

[12] Yoshua Bengio,et al. Neural networks for speech and sequence recognition , 1996 .

[13] Laurene V. Fausett,et al. Fundamentals Of Neural Networks , 1994 .

[14] S. Hyakin,et al. Neural Networks: A Comprehensive Foundation , 1994 .

[15] Martin Fodslette Møller,et al. A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[16] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[17] Philip D. Wasserman,et al. Advanced methods in neural computing , 1993, VNR computer library.

[18] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[19] Yoshua Bengio,et al. Artificial neural networks and their application to sequence recognition , 1991 .

[20] Paul Glasserman,et al. Gradient Estimation Via Perturbation Analysis , 1990 .

[21] Sholom M. Weiss,et al. Computer Systems That Learn , 1990 .

[22] Françoise Fogelman-Soulié,et al. Speaker-independent isolated digit recognition: Multilayer perceptrons vs. Dynamic time warping , 1990, Neural Networks.

[23] Geoffrey E. Hinton,et al. Proceedings of the 1988 Connectionist Models Summer School , 1989 .

[24] Yann LeCun,et al. Improving the convergence of back-propagation learning with second-order methods , 1989 .

[25] James L. McClelland,et al. Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[26] T. Kohonen. Self-organized formation of topographically correct feature maps , 1982 .

[27] Frank Rosenblatt,et al. PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[28] J. Kiefer,et al. Stochastic Estimation of the Maximum of a Regression Function , 1952 .