Closed determination of the number of neurons in the hidden layer of a multi-layered perceptron network

Multi-layered perceptron networks (MLP) have been proven to be universal approximators. However, to take advantage of this theoretical result, we must determine the smallest number of units in the hidden layer. Two basic theoretically established requirements are that an adequate activation function be selected and a proper training algorithm be applied. We must also guarantee that (a) The training data compile with the demands of the universal approximation theorem (UAT) and (b) The amount of information present in the training data be determined. We discuss how to preprocess the data in order to meet such demands. Once this is done, a closed formula to determine H may be applied. Knowing H implies that any unknown function associated to the training data may, in practice, be arbitrarily approximated by a MLP. We take advantage of previous work where a complexity regularization approach tried to minimize the RMS training error. In that work, an algebraic expression of H is attempted by sequential trial-and-error. In contrast, here we find a closed formula $$H=f(m_{O}, N)$$H=f(mO,N) where $$m_{O}$$mO is the number of units in the input layer and N is the effective size of the training data. The algebraic expression we derive stems from statistically determined lower bounds of H in a range of interest of the $$(m_{O}, N)$$(mO,N) pairs. The resulting sequence of 4250 triples $$(H, m_{O}, N)$$(H,mO,N) is replaced by a single 12-term bivariate polynomial. To determine its 12 coefficients and the degrees of the 12 associated terms, a genetic algorithm was applied. The validity of the resulting formula is tested by determining the architecture of twelve MLPs for as many problems and verifying that the RMS error is minimal when using it to determine H.

[1]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[2]  Yoshio Hirose,et al.  Backpropagation algorithm which varies the number of hidden units , 1989, International 1989 Joint Conference on Neural Networks.

[3]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[4]  E. Cheney Introduction to approximation theory , 1966 .

[5]  A. Barron Approximation and Estimation Bounds for Artificial Neural Networks , 1991, COLT '91.

[6]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[7]  Á. Kuri-Morales,et al.  The Best Genetic Algorithm II , 2013, MICAI 2013.

[8]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[9]  J. G. Saw,et al.  Chebyshev Inequality With Estimated Mean and Variance , 1984 .

[10]  Guilherme De A. Barreto,et al.  A novel weight pruning method for MLP classifiers based on the MAXCORE principle , 2011, Neural Computing and Applications.

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  Kay Chen Tan,et al.  Estimating the Number of Hidden Neurons in a Feedforward Network Using the Singular Value Decomposition , 2006, IEEE Trans. Neural Networks.

[13]  Angel Kuri-Morales,et al.  Polynomial Multivariate Approximation with Genetic Algorithms , 2014 .

[14]  Xin Yao,et al.  Evolving artificial neural networks , 1999, Proc. IEEE.

[15]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[16]  Shuxiang Xu,et al.  A novel approach for determining the optimal number of hidden layer neurons for FNN’s and its application in data mining , 2008 .

[17]  Yuichi Nakamura,et al.  Approximation of dynamical systems by continuous time recurrent neural networks , 1993, Neural Networks.

[18]  Andries P. Engelbrecht,et al.  Optimizing the number of hidden nodes of a feedforward artificial neural network , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[19]  Léon Personnaz,et al.  A statistical procedure for determining the optimal number of hidden neurons of a neural model , 2000 .

[20]  Timur Ash,et al.  Dynamic node creation in backpropagation networks , 1989 .

[21]  Hecht-Nielsen Theory of the backpropagation neural network , 1989 .

[22]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[23]  Alistair Moffat,et al.  Implementing the PPM data compression scheme , 1990, IEEE Trans. Commun..

[24]  Shun-ichi Amari,et al.  Network information criterion-determining the number of hidden units for an artificial neural network model , 1994, IEEE Trans. Neural Networks.

[25]  Lawrence F. Shampine,et al.  Numerical computing: An introduction , 1973 .

[26]  Scott E. Fahlman,et al.  An empirical study of learning speed in back-propagation networks , 1988 .

[27]  Jooyoung Park,et al.  Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.

[28]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[29]  SchmidhuberJürgen Deep learning in neural networks , 2015 .