Architectural Parameter-Independent Network Initialization Scheme for Sigmoidal Feedforward ANNs

The selection of the initial network weights has been a known key aspect affecting the convergence of sigmoidal activation function-based artificial neural networks. In this paper, a new network initialization scheme has been proposed that initializes the network weights such that activation functions in the network are not saturated initially. The proposed method ensures that the initial outputs of the hidden neurons are in the active region which positively impacts the network’s rate of convergence. Unlike most of the earlier proposed initialization schemes, this method does not depend on architectural parameters like the size of the input layer or the hidden layer. The performance of the proposed scheme has been compared against eight well-known weight initialization routines over six benchmark real-world problems. Results show that the proposed weight initialization routine enables the network to achieve better performance within the same count of network training epochs. A right-tailed t -test also shows that our proposed scheme is significantly better in most of the cases against the other techniques or statistically similar in a few cases but never underperforms. Hence, it may be considered as a strong alternative to the conventional neural network initialization techniques.

[1]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[2]  Wan-Chi Siu,et al.  An independent component analysis based weight initialization method for multilayer perceptrons , 2002, Neurocomputing.

[3]  Pravin Chandra,et al.  A partially deterministic weight initialization method for SFFANNs , 2014, 2014 IEEE International Advance Computing Conference (IACC).

[4]  H. Khanna Nehemiah,et al.  Neural network classifier optimization using Differential Evolution with Global Information and Back Propagation algorithm for clinical datasets , 2016, Appl. Soft Comput..

[5]  Kodjo Agbossou,et al.  Time series prediction using artificial wavelet neural network and multi-resolution analysis: Application to wind speed data , 2016 .

[6]  Sandro Ridella,et al.  Statistically controlled activation weight initialization (SCAWI) , 1992, IEEE Trans. Neural Networks.

[7]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[8]  Chrisantha Fernando,et al.  PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[9]  S. Iwnicki,et al.  Prediction of wheel and rail wear under different contact conditions using artificial neural networks , 2018, Wear.

[10]  K. P. Sudheer,et al.  Using Artificial Neural Network Approach for Simultaneous Forecasting of Weekly Groundwater Levels at Multiple Sites , 2015, Water Resources Management.

[11]  Mohammad Bagher Menhaj,et al.  Training feedforward networks with the Marquardt algorithm , 1994, IEEE Trans. Neural Networks.

[12]  Huirong Fu,et al.  Evaluation of Gradient Descent Optimization: Using Android Applications in Neural Networks , 2017, 2017 International Conference on Computational Science and Computational Intelligence (CSCI).

[13]  Yu Song,et al.  Application of artificial neural network for the prediction of stock market returns: The case of the Japanese stock market , 2016 .

[14]  Bernard Widrow,et al.  Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[15]  Ajit Narayanan,et al.  Quantum artificial neural network architectures and components , 2000, Inf. Sci..

[16]  Venkatesh Meda,et al.  Artificial neural Network−Genetic algorithm modeling for moisture content prediction of savory leaves drying process in different drying conditions , 2018, Engineering in Agriculture, Environment and Food.

[17]  Kurt Hornik,et al.  Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.

[18]  Tae Kyun Kim,et al.  T test as a parametric statistic , 2015, Korean journal of anesthesiology.

[19]  Ömer Faruk Ertugrul,et al.  A novel type of activation function in artificial neural networks: Trained activation function , 2018, Neural Networks.

[20]  Musheer Ahmed,et al.  Prediction of Human Ethnicity from Facial Images Using Neural Networks , 2018 .

[21]  Terrence J. Sejnowski,et al.  Analysis of hidden units in a layered network trained to classify sonar targets , 1988, Neural Networks.

[22]  B. B. Choudhury,et al.  Gradient Descent with Momentum Based Backpropagation Neural Network for Selection of Industrial Robot , 2016 .

[23]  Malikah Aljurayfani,et al.  Medical Self-Diagnostic System Using Artificial Neural Networks , 2019, 2019 International Conference on Computer and Information Sciences (ICCIS).

[24]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[25]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[26]  Jong Beom Ra,et al.  Weight value initialization for improving training speed in the backpropagation network , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[27]  Morteza Esfandyari,et al.  Stock Market Index Prediction Using Artificial Neural Network , 2016 .

[28]  Tao Chen,et al.  Back propagation neural network with adaptive differential evolution algorithm for time series forecasting , 2015, Expert Syst. Appl..

[29]  Mehdi Khashei,et al.  An artificial neural network (p, d, q) model for timeseries forecasting , 2010, Expert Syst. Appl..

[30]  Qi Li,et al.  Artificial neural networks forecasting of PM2.5 pollution using air mass trajectory based geographic model and wavelet transformation , 2015 .

[31]  Amit Prakash Singh,et al.  A Modification to the Nguyen–Widrow Weight Initialization Method , 2019, Intelligent Systems, Technologies and Applications.

[32]  M. P. S. Bhatia,et al.  A new weight initialization method for sigmoidal FFANN , 2018, J. Intell. Fuzzy Syst..

[33]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[34]  Shubham Gupta,et al.  Novel approach for musical instrument identification using neural network , 2015, 2015 Annual IEEE India Conference (INDICON).

[35]  Neelamegam Premalatha,et al.  Prediction of solar radiation for solar systems by using ANN models with different back propagation algorithms , 2016 .

[36]  Yuhanis Yusof,et al.  Classification of Fundus Images For Diabetic Retinopathy using Artificial Neural Network , 2019, 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT).

[37]  Scott E. Fahlman,et al.  The Recurrent Cascade-Correlation Architecture , 1990, NIPS.

[38]  Saroj Kumar Lenka,et al.  Gradient Descent with Momentum based Neural Network Pattern Classification for the Prediction of Soil Moisture Content in Precision Agriculture , 2015, 2015 IEEE International Symposium on Nanoelectronic and Information Systems.

[39]  Jun Deng,et al.  Predicting non-melanoma skin cancer via a multi-parameterized artificial neural network , 2018, Scientific Reports.

[40]  Jonas Ardö,et al.  Neural networks, multitemporal Landsat Thematic Mapper data and topographic data to classify forest , 1997 .

[41]  George-Christopher Vosniakos,et al.  Optimizing feedforward artificial neural network architecture , 2007, Eng. Appl. Artif. Intell..

[42]  John F. Kolen,et al.  Backpropagation is Sensitive to Initial Conditions , 1990, Complex Syst..

[43]  Emile Fiesler,et al.  High-order and multilayer perceptron initialization , 1997, IEEE Trans. Neural Networks.