Solving the linear interval tolerance problem for weight initialization of neural networks

Determining good initial conditions for an algorithm used to train a neural network is considered a parameter estimation problem dealing with uncertainty about the initial weights. Interval analysis approaches model uncertainty in parameter estimation problems using intervals and formulating tolerance problems. Solving a tolerance problem is defining lower and upper bounds of the intervals so that the system functionality is guaranteed within predefined limits. The aim of this paper is to show how the problem of determining the initial weight intervals of a neural network can be defined in terms of solving a linear interval tolerance problem. The proposed linear interval tolerance approach copes with uncertainty about the initial weights without any previous knowledge or specific assumptions on the input data as required by approaches such as fuzzy sets or rough sets. The proposed method is tested on a number of well known benchmarks for neural networks trained with the back-propagation family of algorithms. Its efficiency is evaluated with regards to standard performance measures and the results obtained are compared against results of a number of well known and established initialization methods. These results provide credible evidence that the proposed method outperforms classical weight initialization methods.

[1]  S. Shapiro,et al.  An Analysis of Variance Test for Normality (Complete Samples) , 1965 .

[2]  C.-L. Chen,et al.  Improving the training speed of three-layer feedforward neural nets by optimal estimation of the initial weights , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[3]  Hisao Ishibuchi,et al.  Improving the generalization ability of neural networks by interval arithmetic , 1998, 1998 Second International Conference. Knowledge-Based Intelligent Electronic Systems. Proceedings KES'98 (Cat. No.98EX111).

[4]  Bartlomiej Jacek Kubica Interval Methods for Solving Underdetermined Nonlinear Systems , 2011, Reliab. Comput..

[5]  Vladik Kreinovich,et al.  Solving Linear Interval Systems Is NP-Hard Even If We Exclude Overflow and Underflow , 1998, Reliab. Comput..

[6]  Egbert J. W. Boers,et al.  Biological metaphors and the design of modular artificial neural networks , 2010 .

[7]  Francisco Herrera,et al.  A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms , 2011, Swarm Evol. Comput..

[8]  Hussein Alnuweiri,et al.  Acceleration of back propagation through initial weight pre-training with delta rule , 1993, IEEE International Conference on Neural Networks.

[9]  M.H. Hassoun,et al.  Fundamentals of Artificial Neural Networks , 1996, Proceedings of the IEEE.

[10]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[11]  Bernard Widrow,et al.  Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[12]  A. Neumaier New techniques for the analysis of linear interval equations , 1984 .

[13]  Rafael Bello,et al.  Rough sets in the Soft Computing environment , 2012, Inf. Sci..

[14]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[15]  Chris P. Tsokos,et al.  Mathematical Statistics with Applications , 2009 .

[16]  Wolfram Burgard,et al.  ALBERT-LUDWIGS-UNIVERSIT ¨ AT FREIBURG , 2006 .

[17]  Zuowei Shen,et al.  Solvability of systems of linear operator equations , 1994 .

[18]  Noboru Murata,et al.  Nonparametric Weight Initialization of Neural Networks via Integral Representation , 2013, ArXiv.

[19]  G. Alefeld,et al.  Interval analysis: theory and applications , 2000 .

[20]  Tommy W. S. Chow,et al.  A weight initialization method for improving training speed in feedforward neural network , 2000, Neurocomputing.

[21]  Siegfried M. Rump,et al.  Self-validating methods , 2001 .

[22]  Lutz Prechelt,et al.  PROBEN 1 - a set of benchmarks and benchmarking rules for neural network training algorithms , 1994 .

[23]  Ramana V. Grandhi,et al.  Efficient estimation of structural reliability for problems with uncertain intervals , 2001 .

[24]  Scott E. Fahlman,et al.  An empirical study of learning speed in back-propagation networks , 1988 .

[25]  Chenyi Hu,et al.  An Application of Interval Methods to Stock Market Forecasting , 2007, Reliab. Comput..

[26]  Amparo Alonso-Betanzos,et al.  Linear-least-squares initialization of multilayer perceptrons through backpropagation of the desired response , 2005, IEEE Transactions on Neural Networks.

[27]  Tommy W. S. Chow,et al.  Feedforward networks training speed enhancement by optimal initialization of the synaptic coefficients , 2001, IEEE Trans. Neural Networks.

[28]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[29]  Sepp Hochreiter,et al.  Guessing can Outperform Many Long Time Lag Algorithms , 1996 .

[30]  Aristoklis D. Anastasiadis Neural networks training and applications using biological data , 2006 .

[31]  Myung Won Kim,et al.  The effect of initial weights on premature saturation in back-propagation learning , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[32]  J. A. Mulder,et al.  Neural Network Output Optimization Using Interval Analysis , 2009, IEEE Transactions on Neural Networks.

[33]  Antonio Ruiz Cortés,et al.  STATService: Herramienta de análisis estadístico como soporte para la investigación con Metaheurísticas , 2012 .

[34]  Etienne Barnard,et al.  Avoiding false local minima by proper initialization of connections , 1992, IEEE Trans. Neural Networks.

[35]  Emile Fiesler,et al.  High-order and multilayer perceptron initialization , 1997, IEEE Trans. Neural Networks.

[36]  Hailong Li,et al.  A Global Optimization Algorithm Based on Novel Interval Analysis for Training Neural Networks , 2007, ISICA.

[37]  Hisashi Shimodaira,et al.  A weight value initialization method for improving learning performance of the backpropagation algorithm in neural networks , 1994, Proceedings Sixth International Conference on Tools with Artificial Intelligence. TAI 94.

[38]  Neil Salkind,et al.  Using SPSS for Windows and Macintosh : Analyzing and Understanding Data , 2004 .

[39]  G. Alefeld,et al.  Introduction to Interval Computation , 1983 .

[40]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[41]  Tapani Raiko,et al.  Pushing Stochastic Gradient towards Second-Order Methods -- Backpropagation Learning with Transformations in Nonlinearities , 2013, ICLR.

[42]  Frans van den Bergh,et al.  Particle Swarm Weight Initialization In Multi-Layer Perceptron Artificial Neural Networks , 1999 .

[43]  G. Roeck,et al.  Improving interval analysis in finite element calculations by means of affine arithmetic , 2010 .

[44]  Eldon Hansen,et al.  Global optimization using interval analysis , 1992, Pure and applied mathematics.

[45]  Gonzalo Acuña,et al.  An Interval Approach for Weight's Initialization of Feedforward Neural Networks , 2006, MICAI.

[46]  J. Marron,et al.  The high-dimension, low-sample-size geometric representation holds under mild conditions , 2007 .

[47]  A. K. Rigler,et al.  Accelerating the convergence of the back-propagation method , 1988, Biological Cybernetics.

[48]  Philip E. Gill,et al.  Practical optimization , 1981 .

[49]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[50]  Alan Agresti,et al.  Statistics: The Art and Science of Learning from Data , 2005 .

[51]  Byung-Woo Min,et al.  Neural networks using modified initial connection strengths by the importance of feature elements , 1995, 1995 IEEE International Conference on Systems, Man and Cybernetics. Intelligent Systems for the 21st Century.

[52]  Francisco Herrera,et al.  A study on the use of statistical tests for experimentation with neural networks: Analysis of parametric test conditions and non-parametric tests , 2007, Expert Syst. Appl..

[53]  Tommy W. S. Chow,et al.  Determining initial weights of feedforward neural networks based on least squares method , 1995, Neural Processing Letters.

[54]  Jong Beom Ra,et al.  Weight value initialization for improving training speed in the backpropagation network , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[55]  S. Osowski,et al.  New approach to selection of initial values of weights in neural function approximation , 1993 .

[56]  A. Prochazka,et al.  ALGORITHMS FOR INITIALIZATION OF NEURAL NETWORK WEIGHTS , 2004 .

[57]  Paolo Rocca,et al.  Tolerance analysis with phase errors in linear arrays by means of Interval Arithmetic , 2014, The 8th European Conference on Antennas and Propagation (EuCAP 2014).

[58]  Martin Fodslette Meiller A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1993 .

[59]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[60]  Mohammad Bagher Menhaj,et al.  Training feedforward networks with the Marquardt algorithm , 1994, IEEE Trans. Neural Networks.

[61]  Sergey P. Shary,et al.  A New Technique in Systems Analysis Under Interval Uncertainty and Ambiguity , 2002, Reliab. Comput..

[62]  Martin T. Hagan,et al.  Neural network design , 1995 .

[63]  Mercedes Fernández-Redondo,et al.  Weight initialization methods for multilayer feedforward , 2001, ESANN.

[64]  John Hansen Using SPSS for Windows and Macintosh: Analyzing and Understanding Data , 2005 .

[65]  J. Rohn,et al.  Cheap and Tight Bounds : The Recent Result , 2013 .

[66]  Alexandre Goldsztejn Comparison of the Hansen-Sengupta and the Frommer-Lang-Schnurr existence tests , 2006, Computing.

[67]  Sergey P. Shary,et al.  Solving the linear interval tolerance problem , 1995 .

[68]  Jiri Rohn,et al.  Solvability of Systems of Linear Interval Equations , 2003, SIAM J. Matrix Anal. Appl..

[69]  Eldon Hansen,et al.  Bounding the solution of interval linear equations , 1992 .

[70]  R. B. Kearfott,et al.  Interval Computations: Introduction, Uses, and Resources , 2000 .

[71]  J. Lam,et al.  Novel global robust stability criteria for interval neural networks with multiple time-varying delays , 2005 .

[72]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[73]  Vladik Kreinovich,et al.  Finding Least Expensive Tolerance Solutions and Least Expensive Tolerance Revisions: Algorithms and Computational Complexity , 2006 .

[74]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[75]  Makoto Aoshima,et al.  Intrinsic Dimensionality Estimation of High-Dimension, Low Sample Size Data with D-Asymptotics , 2010 .

[76]  Rudolf Krawczyk,et al.  Newton-Algorithmen zur Bestimmung von Nullstellen mit Fehlerschranken , 1969, Computing.

[77]  Tommy W. S. Chow,et al.  A new method in determining initial weights of feedforward neural networks for training enhancement , 1997, Neurocomputing.

[78]  V. Kreinovich Computational Complexity and Feasibility of Data Processing and Interval Computations , 1997 .

[79]  Hübner,et al.  Dimensions and entropies of chaotic intensity pulsations in a single-mode far-infrared NH3 laser. , 1989, Physical review. A, General physics.

[80]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[81]  L. Zadeh Fuzzy sets as a basis for a theory of possibility , 1999 .

[82]  A. S. Weigend,et al.  Results of the time series prediction competition at the Santa Fe Institute , 1993, IEEE International Conference on Neural Networks.

[83]  A. Neumaier Interval methods for systems of equations , 1990 .

[84]  Glenn Gamst,et al.  Performing Data Analysis Using IBM SPSS , 2013 .

[85]  Jürgen Garloff,et al.  Guaranteed Parameter Set Estimation for Exponential Sums: The Three-Terms Case , 2007, Reliab. Comput..

[86]  Sandro Ridella,et al.  Statistically controlled activation weight initialization (SCAWI) , 1992, IEEE Trans. Neural Networks.

[88]  Eldon Hansen,et al.  Solving Overdetermined Systems of Interval Linear Equations , 2006, Reliab. Comput..

[89]  H. Johnson,et al.  A comparison of 'traditional' and multimedia information systems development practices , 2003, Inf. Softw. Technol..

[90]  Frank J. Smieja Hyperplane \spin" Dynamics, Network Plasticity and Back-propagation Learning , 1991 .

[91]  George D. Magoulas,et al.  Effective Backpropagation Training with Variable Stepsize , 1997, Neural Networks.

[92]  Olivier Beaumont,et al.  Linear Interval Tolerance Problem and Linear Programming Techniques , 2001, Reliab. Comput..

[93]  R. B. Kearfott,et al.  A Comparison of some Methods for Solving Linear Interval Equations , 1997 .