Parametric Statistical Estimation with Artificial Neural Networks: A Condensed Discussion

Learning in artificial neural networks is a process by which experience arising from exposure to measurements of empirical phenomena is converted to knowledge, embodied in network weights. This process can be viewed formally as statistical estimation of the parameters of a parametrized probability model. We exploit this formal viewpoint to give a unified theory of learng in artificial neural networks. The theory encompasses both supervised and unsupervised learning in either feedforward or recurrent networks. We begin by describing various objects appropriate for learning, such as conditional means, variances or quantiles, or conditional densities. We then show how artificial neural networks can be viewed as parametric statistical models directed toward these objects of interest. We show how a probability density can be associated with the output of any network, and use this density to define network weights indexing an information-theoretically optimal approximation to the object of interest. We next study statistical properties of quasi-maximum likelihood estimators consistent for the optimal weights, including issues associated with statistical inference about the optimal weights. Finally, we consider computational methods for obtaining these estimators, with special attention to extensions of the method of back-propagation.

[1]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[2]  Claude E. Shannon,et al.  The mathematical theory of communication , 1950 .

[3]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[4]  R. Hogg Some Observations on Robust Estimation , 1967 .

[5]  F. Hampel Contributions to the theory of robust estimation , 1968 .

[6]  R. R. Bahadur,et al.  Some asymptotic properties of likelihood ratios on general sample spaces , 1972 .

[7]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[8]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[9]  F. Hampel The Influence Curve and Its Role in Robust Estimation , 1974 .

[10]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[11]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[12]  B. G. Quinn,et al.  The determination of the order of an autoregression , 1979 .

[13]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[14]  Bruce Hajek,et al.  A tutorial survey of theory and applications of simulated annealing , 1985, 1985 24th IEEE Conference on Decision and Control.

[15]  V. Cerný Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm , 1985 .

[16]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[17]  H. White,et al.  A Unified Theory of Estimation and Inference for Nonlinear Dynamic Models , 1988 .

[18]  Bruce E. Hajek,et al.  Cooling Schedules for Optimal Annealing , 1988, Math. Oper. Res..

[19]  Norio Baba,et al.  A new approach for finding the global minimum of error function of neural networks , 1989, Neural Networks.

[20]  M. A. Styblinski,et al.  Experiments in nonconvex optimization: Stochastic approximation with function smoothing and simulated annealing , 1990, Neural Networks.

[21]  H. White Nonparametric Estimation of Conditional Quantiles Using Neural Networks , 1990 .

[22]  Halbert White,et al.  Estimation, inference, and specification analysis , 1996 .

[23]  Constantino Tsallis,et al.  Optimization by Simulated Annealing: Recent Progress , 1995 .