On the distribution of performance from multiple neural-network trials

The performance of neural network simulations is often reported in terms of the mean and standard deviation of a number of simulations performed with different starting conditions. However, in many cases, the distribution of the individual results does not approximate a Gaussian distribution, may not be symmetric, and may be multimodal. We present the distribution of results for practical problems and show that assuming Gaussian distributions can significantly affect the interpretation of results, especially those of comparison studies. For a controlled task which we consider, we find that the distribution of performance is skewed toward better performance for smoother target functions and skewed toward worse performance for more complex target functions. We propose new guidelines for reporting performance which provide more information about the actual distribution.

[1]  Peter Auer,et al.  Exponentially many local minima for single neurons , 1995, NIPS.

[2]  L. Glass,et al.  Oscillation and chaos in physiological control systems. , 1977, Science.

[3]  Neil A. Weiss,et al.  Introductory Statistics , 1982 .

[4]  Andrew D. Back New techniques for nonlinear system identification : a rapprochement between neural networks and linear systems , 1992 .

[5]  William H. Press,et al.  Numerical recipes , 1990 .

[6]  Jean-Didier Legat,et al.  A statistical neural network for high-dimensional vector classification , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[7]  J. D. Farmer,et al.  Chaotic attractors of an infinite-dimensional dynamical system , 1982 .

[8]  Tad Hogg,et al.  An Economics Approach to Hard Computational Problems , 1997, Science.

[9]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[10]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[11]  John E. Moody,et al.  Note on Learning Rate Schedules for Stochastic Optimization , 1990, NIPS.

[12]  C. Lee Giles,et al.  What Size Neural Network Gives Optimal Generalization? Convergence Properties of Backpropagation , 1998 .

[13]  F. Post,et al.  An Economics Approach to Hard Computational Problems , 1997 .

[14]  Ah Chung Tsoi,et al.  FIR and IIR Synapses, a New Neural Network Architecture for Time Series Modeling , 1991, Neural Computation.

[15]  J. Slawny,et al.  Back propagation fails to separate where perceptrons succeed , 1989 .

[16]  I. Miller Probability, Random Variables, and Stochastic Processes , 1966 .

[17]  Harald Bergstriim Mathematical Theory of Probability and Statistics , 1966 .

[18]  E. Parzen 1. Random Variables and Stochastic Processes , 1999 .

[19]  Arthur Flexer,et al.  Statistical evaluation of neural networks experiments: Minimum requirements and current practice , 1994 .

[20]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[21]  A. S. Weigend,et al.  Results of the time series prediction competition at the Santa Fe Institute , 1993, IEEE International Conference on Neural Networks.

[22]  Teofilo F. Gonzalez,et al.  An Efficient Algorithm for the Kolmogorov-Smirnov and Lilliefors Tests , 1977, TOMS.

[23]  G. Lugosi,et al.  Strong Universal Consistency of Neural Network Classifiers , 1993, Proceedings. IEEE International Symposium on Information Theory.

[24]  Ah Chung Tsoi,et al.  Lessons in Neural Network Training: Overfitting May be Harder than Expected , 1997, AAAI/IAAI.

[25]  Richard Von Mises,et al.  Mathematical Theory of Probability and Statistics , 1966 .

[26]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[27]  J. Stephen Judd,et al.  On the complexity of loading shallow neural networks , 1988, J. Complex..

[28]  András Faragó,et al.  Strong universal consistency of neural network classifiers , 1993, IEEE Trans. Inf. Theory.

[29]  Ah Chung Tsoi,et al.  A unifying view of some training algorithms for multilayer perceptrons with FIR filter synapses , 1994, Proceedings of IEEE Workshop on Neural Networks for Signal Processing.

[30]  J. Stephen Judd,et al.  Neural network design and the complexity of learning , 1990, Neural network modeling and connectionism.