The general inefficiency of batch training for gradient descent learning

[1]  Philipp Slusallek,et al.  Introduction to real-time ray tracing , 2005, SIGGRAPH Courses.

[2]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[3]  Tony R. Martinez,et al.  The need for small learning rates on large problems , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[4]  Ali Zilouchian,et al.  FUNDAMENTALS OF NEURAL NETWORKS , 2001 .

[5]  Amir F. Atiya,et al.  New results on recurrent network training: unifying the algorithms and accelerating convergence , 2000, IEEE Trans. Neural Networks Learn. Syst..

[6]  Jose C. Principe,et al.  Neural and adaptive systems , 2000 .

[7]  J. Nazuno Haykin, Simon. Neural networks: A comprehensive foundation, Prentice Hall, Inc. Segunda Edición, 1999 , 2000 .

[8]  J. Spall Stochastic Optimization, Stochastic Approximation and Simulated Annealing , 1999 .

[9]  Enrico Gobbetti,et al.  Encyclopedia of Electrical and Electronics Engineering , 1999 .

[10]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[11]  M.H. Hassoun,et al.  Fundamentals of Artificial Neural Networks , 1996, Proceedings of the IEEE.

[12]  Yoshua Bengio,et al.  Neural networks for speech and sequence recognition , 1996 .

[13]  Laurene V. Fausett,et al.  Fundamentals Of Neural Networks , 1994 .

[14]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[15]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[16]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[17]  Philip D. Wasserman,et al.  Advanced methods in neural computing , 1993, VNR computer library.

[18]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[19]  Yoshua Bengio,et al.  Artificial neural networks and their application to sequence recognition , 1991 .

[20]  Paul Glasserman,et al.  Gradient Estimation Via Perturbation Analysis , 1990 .

[21]  Sholom M. Weiss,et al.  Computer Systems That Learn , 1990 .

[22]  Françoise Fogelman-Soulié,et al.  Speaker-independent isolated digit recognition: Multilayer perceptrons vs. Dynamic time warping , 1990, Neural Networks.

[23]  Geoffrey E. Hinton,et al.  Proceedings of the 1988 Connectionist Models Summer School , 1989 .

[24]  Yann LeCun,et al.  Improving the convergence of back-propagation learning with second-order methods , 1989 .

[25]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[26]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[27]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[28]  J. Kiefer,et al.  Stochastic Estimation of the Maximum of a Regression Function , 1952 .