A comparison of methods to avoid overfitting in neural networks training in the case of catchment runoff modelling

Summary Artificial neural networks (ANNs) becomes very popular tool in hydrology, especially in rainfall–runoff modelling. However, a number of issues should be addressed to apply this technique to a particular problem in an efficient way, including selection of network type, its architecture, proper optimization algorithm and a method to deal with overfitting of the data. The present paper addresses the last, rarely considered issue, namely comparison of methods to prevent multi-layer perceptron neural networks from overfitting of the training data in the case of daily catchment runoff modelling. Among a number of methods to avoid overfitting the early stopping, the noise injection and the weight decay have been known for about two decades, however only the first one is frequently applied in practice. Recently a new methodology called optimized approximation algorithm has been proposed in the literature. Overfitting of the training data leads to deterioration of generalization properties of the model and results in its untrustworthy performance when applied to novel measurements. Hence the purpose of the methods to avoid overfitting is somehow contradictory to the goal of optimization algorithms, which aims at finding the best possible solution in parameter space according to pre-defined objective function and available data. Moreover, different optimization algorithms may perform better for simpler or larger ANN architectures. This suggest the importance of proper coupling of different optimization algorithms, ANN architectures and methods to avoid overfitting of real-world data – an issue that is also studied in details in the present paper. The study is performed for Annapolis River catchment, characterized by significant seasonal changes in runoff, rapid floods during winter and spring, moderately dry summers, severe winters with snowfall, snow melting, frequent freeze and thaw, and presence of river ice. The present paper shows that the elaborated noise injection method may prevent overfitting slightly better than the most popular early stopping approach. However, the implementation of noise injection to real-world problems is difficult and the final model performance depends significantly on a number of very technical details, what somehow limits its practical applicability. It is shown that optimized approximation algorithm does not improve the results obtained by older methods, possibly due to over-simplified criterion of stopping the algorithm. Extensive calculations reveal that Evolutionary Computation-based algorithm performs better for simpler ANN architectures, whereas classical gradient-based Levenberg–Marquardt algorithm is able to benefit from additional input variables, representing precipitation and snow cover from one more previous day, and from more complicated ANN architectures. This confirms that the curse of dimensionality has severe impact on the performance of Evolutionary Computing methods.

[1]  Julio J. Valdés,et al.  Computational intelligence in earth sciences and environmental applications: Issues and challenges , 2006, Neural Networks.

[2]  Anne Johannet,et al.  Complexity selection of a neural network model for karst flood forecasting: The case of the Lez Basin (southern France) , 2011 .

[3]  Mukta Paliwal,et al.  Neural networks and statistical techniques: A review of applications , 2009, Expert Syst. Appl..

[4]  K. Chau,et al.  Neural network and genetic programming for modelling coastal algal blooms , 2006 .

[5]  Robert J. Marks,et al.  Similarities of error regularization, sigmoid gain scaling, target smoothing, and training with jitter , 1995, IEEE Trans. Neural Networks.

[6]  Christian Blum,et al.  An ant colony optimization algorithm for continuous optimization: application to feed-forward neural network training , 2007, Neural Computing and Applications.

[7]  Martin Mandischer A comparison of evolution strategies and backpropagation for neural network training , 2002, Neurocomputing.

[8]  Surajit Chattopadhyay,et al.  A neurocomputing approach to predict monsoon rainfall in monthly scale using SST anomaly as a predictor , 2012, Acta Geophysica.

[9]  P. N. Suganthan,et al.  Differential Evolution: A Survey of the State-of-the-Art , 2011, IEEE Transactions on Evolutionary Computation.

[10]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[11]  Ashu Jain,et al.  Development of effective and efficient rainfall‐runoff models using integration of deterministic, real‐coded genetic algorithms and artificial neural network techniques , 2004 .

[12]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[13]  Vladik Kreinovich,et al.  Guaranteed Intervals for Kolmogorov’s Theorem (and Their Possible Relation to Neural Networks) , 2004 .

[14]  Vahid Nourani,et al.  Hybrid Wavelet-Genetic Programming Approach to Optimize ANN Modeling of Rainfall-Runoff Process , 2012 .

[15]  Tomaso A. Poggio,et al.  Representation Properties of Networks: Kolmogorov's Theorem Is Irrelevant , 1989, Neural Computation.

[16]  Shaozhong Kang,et al.  Integrated neural networks for monthly river flow estimation in arid inland basin of Northwest China , 2012 .

[17]  Lutz Prechelt,et al.  Automatic early stopping using cross validation: quantifying the criteria , 1998, Neural Networks.

[18]  Avi Ostfeld,et al.  Data-driven modelling: some past experiences and new approaches , 2008 .

[19]  Petri Koistinen,et al.  Using additive noise in back-propagation training , 1992, IEEE Trans. Neural Networks.

[20]  D. Solomatine,et al.  Model trees as an alternative to neural networks in rainfall—runoff modelling , 2003 .

[21]  Adam P. Piotrowski,et al.  Estimation of parameters of the transient storage model by means of multi-layer perceptron neural networks / Estimation des paramètres du modèle de transport TSM au moyen de réseaux de neurones perceptrons multi-couches , 2008 .

[22]  null null,et al.  Artificial Neural Networks in Hydrology. II: Hydrologic Applications , 2000 .

[23]  P. N. Suganthan,et al.  Differential Evolution Algorithm With Strategy Adaptation for Global Numerical Optimization , 2009, IEEE Transactions on Evolutionary Computation.

[24]  Nicholas C. Coops,et al.  Forest canopy effects on snow accumulation and ablation: an integrative review of empirical results. , 2010 .

[25]  Robert P. W. Duin,et al.  K-nearest Neighbors Directed Noise Injection in Multilayer Perceptron Training , 2000, IEEE Trans. Neural Networks Learn. Syst..

[26]  Michael Y. Hu,et al.  Forecasting with artificial neural networks: The state of the art , 1997 .

[27]  Chuntian Cheng,et al.  A comparison of performance of several artificial intelligence , 2009 .

[28]  Jocelyn Sietsma,et al.  Creating artificial neural networks that generalize , 1991, Neural Networks.

[29]  Julian Morris,et al.  A procedure for determining the topology of multilayer feedforward neural networks , 1994, Neural Networks.

[30]  Vahid Nourani,et al.  Sensitivity analysis of the artificial neural network outputs in simulation of the evaporation process at different climatologic regimes , 2012, Adv. Eng. Softw..

[31]  Michael Griebel,et al.  On a Constructive Proof of Kolmogorov’s Superposition Theorem , 2009 .

[32]  César Hervás-Martínez,et al.  Evolutionary product unit based neural networks for regression , 2006, Neural Networks.

[33]  Adam P. Piotrowski,et al.  Comparison of evolutionary computation techniques for noise injected neural network training to estimate longitudinal dispersion coefficients in rivers , 2012, Expert Syst. Appl..

[34]  Chuntian Cheng,et al.  Optimizing Hydropower Reservoir Operation Using Hybrid Genetic Algorithm and Chaos , 2008 .

[35]  David H. Wolpert,et al.  Coevolutionary free lunches , 2005, IEEE Transactions on Evolutionary Computation.

[36]  Jan Adamowski,et al.  Comparison of Multivariate Regression and Artificial Neural Networks for Peak Urban Water-Demand Forecasting: Evaluation of Different ANN Learning Algorithms , 2010 .

[37]  Klaus-Robert Müller,et al.  Asymptotic statistical theory of overtraining and cross-validation , 1997, IEEE Trans. Neural Networks.

[38]  Chuntian Cheng,et al.  Combining a fuzzy optimal model with a genetic algorithm to solve multi-objective rainfall–runoff model calibration , 2002 .

[39]  Goutami Chattopadhyay,et al.  Identification of the best architecture of a multilayer perceptron in modeling daily total ozone concentration over Kolkata, India , 2011 .

[40]  David Naso,et al.  Compact Differential Evolution , 2011, IEEE Transactions on Evolutionary Computation.

[41]  Kwok-wing Chau,et al.  Particle Swarm Optimization Training Algorithm for ANNs in Stage Prediction of Shing Mun River , 2006 .

[42]  Todd C. Rasmussen,et al.  Advances in variable selection methods I: Causal selection methods versus stepwise regression and principal component analysis on data of known and unknown functional relationships , 2012 .

[43]  A. W. Minns,et al.  The extrapolation of artificial neural networks for the modelling of rainfall-runoff relationships , 2005 .

[44]  Holger R. Maier,et al.  Neural networks for the prediction and forecasting of water resource variables: a review of modelling issues and applications , 2000, Environ. Model. Softw..

[45]  K. P. Sudheer,et al.  Methods used for the development of neural networks for the prediction of water resource variables in river systems: Current status and future directions , 2010, Environ. Model. Softw..

[46]  Jürgen Branke,et al.  Evolutionary optimization in uncertain environments-a survey , 2005, IEEE Transactions on Evolutionary Computation.

[47]  Adam Kiczko,et al.  Differential Evolution algorithm with Separated Groups for multi-dimensional optimization problems , 2012, Eur. J. Oper. Res..

[48]  R. Storn,et al.  Differential Evolution - A simple and efficient adaptive scheme for global optimization over continuous spaces , 2004 .

[49]  R. S. Govindaraju,et al.  Artificial Neural Networks in Hydrology , 2010 .

[50]  Joni-Kristian Kämäräinen,et al.  Differential Evolution Training Algorithm for Feed-Forward Neural Networks , 2003, Neural Processing Letters.

[51]  Adam P. Piotrowski,et al.  Flash-flood forecasting by means of neural networks and nearest neighbour approach – a comparative study , 2006 .

[52]  Lorenzo L. Pesce,et al.  Noise injection for training artificial neural networks: a comparison with weight decay and early stopping. , 2009, Medical physics.

[53]  Chien-Yu Huang,et al.  Evaluating the process of a genetic algorithm to improve the back-propagation network: A Monte Carlo study , 2009, Expert Syst. Appl..

[54]  Mansour A. Al-Garni Interpretation of spontaneous potential anomalies from some simple geometrically shaped bodies using neural network inversion , 2010 .

[55]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[56]  Dimitris K. Tasoulis,et al.  Enhancing Differential Evolution Utilizing Proximity-Based Mutation Operators , 2011, IEEE Transactions on Evolutionary Computation.

[57]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[58]  Robert J. Abrahart,et al.  Symbiotic adaptive neuro-evolution applied to rainfall-runoff modelling in northern England , 2006, Neural Networks.

[59]  Tong Heng Lee,et al.  Geometrical interpretation and architecture selection of MLP , 2005, IEEE Transactions on Neural Networks.

[60]  Jatinder N. D. Gupta,et al.  Comparative evaluation of genetic algorithm and backpropagation for training neural networks , 2000, Inf. Sci..

[61]  Zhen Zhu,et al.  Optimized Approximation Algorithm in Neural Networks Without Overfitting , 2008, IEEE Transactions on Neural Networks.

[62]  S. G. Wallis,et al.  Evaluation of 1-D tracer concentration profile by ANN , 2007 .

[63]  Xin Yao,et al.  A New Constructive Algorithm for Architectural and Functional Adaptation of Artificial Neural Networks , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[64]  Renata J. Romanowicz,et al.  Influence of afforestation on water regime in Jizera Catchments, Czech Republic , 2012, Acta Geophysica.

[65]  Ue-Pyng Wen,et al.  A review of Hopfield neural networks for solving mathematical programming problems , 2009, Eur. J. Oper. Res..

[66]  Zixiang Xiong,et al.  Noise-injected neural networks show promise for use on small-sample expression data , 2006, BMC Bioinformatics.

[67]  C. L. Wu,et al.  Rainfall–runoff modeling using artificial neural network coupled with singular spectrum analysis , 2011 .

[68]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[69]  Guozhong An,et al.  The Effects of Adding Noise During Backpropagation Training on a Generalization Performance , 1996, Neural Computation.

[70]  Eric Gaume,et al.  Over-parameterisation, a major obstacle to the use of artificial neural networks in hydrology? , 2003 .

[71]  Orazio Giustolisi,et al.  Improving generalization of artificial neural networks in rainfall–runoff modelling / Amélioration de la généralisation de réseaux de neurones artificiels pour la modélisation pluie-débit , 2005 .

[72]  Renata J. Romanowicz,et al.  The relationship between snowpack dynamics and NAO/AO indices in SW Spitsbergen , 2011 .

[73]  Juan Julián Merelo Guervós,et al.  G-Prop: Global optimization of multilayer perceptrons using GAs , 2000, Neurocomputing.

[74]  Michael R. Lyu,et al.  A hybrid particle swarm optimization-back-propagation algorithm for feedforward neural network training , 2007, Appl. Math. Comput..

[75]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[76]  Tom Gedeon,et al.  Use of Noise to Augment Training Data: A Neural Network Method of Mineral–Potential Mapping in Regions of Limited Known Deposit Examples , 2003 .

[77]  Adam P. Piotrowski,et al.  Optimizing neural networks for river flow forecasting – Evolutionary Computation methods versus the Levenberg–Marquardt approach , 2011 .

[78]  Amit Konar,et al.  Differential Evolution Using a Neighborhood-Based Mutation Operator , 2009, IEEE Transactions on Evolutionary Computation.

[79]  Pascal Bouvry,et al.  Improving Classical and Decentralized Differential Evolution With New Mutation Operator and Population Topologies , 2011, IEEE Transactions on Evolutionary Computation.

[80]  C. L. Wu,et al.  Methods to improve neural network performance in daily flows prediction , 2009 .

[81]  Dimitri Solomatine,et al.  Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology - Part 2: Application , 2009 .

[82]  Chuntian Cheng,et al.  Using support vector machines for long-term discharge prediction , 2006 .

[83]  Fi-John Chang,et al.  Evolutionary artificial neural networks for hydrological systems forecasting , 2009 .

[84]  Yves Grandvalet,et al.  Noise Injection: Theoretical Prospects , 1997, Neural Computation.

[85]  Abdullatif Ben-Nakhi,et al.  Architecture and performance of neural networks for efficient A/C control in buildings , 2003 .

[86]  Abd-Krim Seghouane,et al.  Regularizing the effect of input noise injection in feedforward neural networks training , 2004, Neural Computing & Applications.