论文信息 - Stochastic Gradient Learning in Neural Networks

Stochastic Gradient Learning in Neural Networks

Many connectionist learning algorithms consists of minimizing a cost of the form C(w) = E(J(z,w)) = J(z,w)dP(z) where dP is an unknown probability distribution that characterizes the problem to learn, and J, the loss function, defines the learning system itself. This popular statistical formulation has led to many theoretical results. The minimization of such a cost may be achieved with a stochastic gradient descent algorithm, e.g.: wt+1 = wt − ɛt∇wJ(z,wt) With some restrictions on J and C, this algorithm converges, even if J is non differentiable on a set of measure 0. Links with simulated annealing are depicted.

L. Bottou

[1] S. Friedman. On Stochastic Approximations , 1963 .

[2] E. G. Gladyshev. On Stochastic Approximation , 1965 .

[3] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[4] Kumpati S. Narendra,et al. Adaptation and learning in automatic systems , 1974 .

[5] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.

[6] Y. L. Cun,et al. Modèles connexionnistes de l'apprentissage , 1987 .

[7] Bernard Widrow,et al. Adaptive switching circuits , 1988 .

[8] Teuvo Kohonen,et al. Statistical pattern recognition with neural networks , 1988, Neural Networks.

[9] Halbert White,et al. Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[10] Mohamed Najim,et al. Algorithmes adaptatifs et approximations stochastiques. Thèorie et application : A. Benveniste, M. Métivier and P. Priouret , 1990, at - Automatisierungstechnik.