Groundwater Quality Modeling with a Small Data Set

Seventeen groundwater quality variables collected during an 8-year period (2006 to 2013) in Andimeshk, Iran, were used to implement an artificial neural network (NN) with the purpose of constructing a water quality index (WQI). The method leading to the WQI avoids instabilities and overparameterization, two problems common when working with relatively small data sets. The groundwater quality variables used to construct the WQI were selected based on principal component analysis (PCA) by which the number of variables were decreased to six. To fulfill the goals of this study, the performance of three methods (1) bootstrap aggregation with early stopping; (2) noise injection; and (3) ensemble averaging with early stopping was compared. The criteria used for performance analysis was based on mean squared error (MSE) and coefficient of determination (R(2) ) of the test data set and the correlation coefficients between WQI targets and NN predictions. This study confirmed the importance of PCA for variable selection and dimensionality reduction to reduce the risk of overfitting. Ensemble averaging with early stopping proved to be the best performed method. Owing to its high coefficient of determination (R(2)  = 0.80) and correlation coefficient (r=0.91), we recommended ensemble averaging with early stopping as an accurate NN modeling procedure for water quality prediction in similar studies.

[1]  William W. Hsieh,et al.  Applying Neural Network Models to Prediction and Data Analysis in Meteorology and Oceanography. , 1998 .

[2]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[3]  Donald A. Jackson,et al.  Variable selection in large environmental data sets using principal components analysis , 1999 .

[4]  Reza Saeedi,et al.  Assessment of water quality in groundwater resources of Iran using a modified drinking water quality index (DWQI) , 2013 .

[5]  J. Pulliainen,et al.  Application of an empirical neural network to surface water quality estimation in the Gulf of Finland using combined optical data and microwave data , 2002 .

[6]  L. Breiman Arcing Classifiers , 1998 .

[7]  Mohammad Firuz Ramli,et al.  Artificial neural network modeling of the water quality index for Kinta River (Malaysia) using water quality variables as predictors. , 2012, Marine pollution bulletin.

[8]  A. Malik,et al.  Artificial neural network modeling of the river water quality—A case study , 2009 .

[9]  Christopher M. Bishop,et al.  Neural Network for Pattern Recognition , 1995 .

[10]  Adam P. Piotrowski,et al.  A comparison of methods to avoid overfitting in neural networks training in the case of catchment runoff modelling , 2013 .

[11]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[12]  Jocelyn Sietsma,et al.  Creating artificial neural networks that generalize , 1991, Neural Networks.

[13]  Yoshihiko Hamamoto,et al.  Evaluation of the Noise Injection in High Dimensions , 1996, MVA.

[14]  Jackson A. Nickerson,et al.  Data reduction techniques and hypothesis testing for analysis of benchmarking data , 1999 .

[15]  Duarte Silva Discarding variables in principal component analysis: algorithms for all-subsets comparisons based on the RV coefficient , 2000 .

[16]  Fei Gu,et al.  Improved Chou-Fasman method for protein secondary structure prediction , 2006, BMC Bioinformatics.

[17]  Kevin L. Priddy,et al.  Artificial neural networks - an introduction , 2005, Tutorial text series.

[18]  Ian T. Jolliffe,et al.  Discarding Variables in a Principal Component Analysis. I: Artificial Data , 1972 .

[19]  F. Windmeijer,et al.  An R-squared measure of goodness of fit for some common nonlinear regression models , 1997 .

[20]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[21]  Zixiang Xiong,et al.  Noise-injected neural networks show promise for use on small-sample expression data , 2006, BMC Bioinformatics.

[22]  Lorenzo L. Pesce,et al.  Noise injection for training artificial neural networks: a comparison with weight decay and early stopping. , 2009, Medical physics.

[23]  M. Schaap,et al.  Neural network analysis for hierarchical prediction of soil hydraulic properties , 1998 .