Imputation of missing values in a precipitation–runoff process database

Hydrologists are often faced with the problem of missing values in a precipitation-runoff process database to construct runoff prediction models. They tend to use simple and naive methods to deal with the problem of missing data. Thus far, the common practice has been to discard observations with missing values. In this paper, we present some statistically principled methods for gap filling and discuss the pros and cons of these methods. We employ and discuss imputations of missing values by means of self-organizing map (SOM), multilayer perceptron (MLP), multivariate nearest-neighbor (MNN), regularized expectation-maximization algorithm (REGEM) and multiple imputation (MI) in the context of a precipitation-runoff process database in northern Iran in order to construct a serially complete database for analyses such as runoff prediction. In our case, the SOM and MNN tend to give similar and robust results. REGEM and MI build on the assumption of multivariate normal data, which we don't seem to have in one of our cases. MLP tends to produce inferior results because it fragments the data into 68 different models. Therefore, we conclude that it makes most sense to use either the computationally simple MNN method or the more demanding SOM. (Less)

[1]  T. Schneider Analysis of Incomplete Climate Data: Estimation of Mean Values and Covariance Matrices and Imputation of Missing Values. , 2001 .

[2]  Sophie Midenet,et al.  Self-Organising Map for Data Imputation and Correction in Surveys , 2002, Neural Computing & Applications.

[3]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[4]  John K. Dixon,et al.  Pattern Recognition with Partly Missing Data , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[5]  Christian W. Dawson,et al.  Hydrological modelling using artificial neural networks , 2001 .

[6]  Therese D. Pigott,et al.  A Review of Methods for Missing Data , 2001 .

[7]  Stephen Roberts,et al.  Spatial and temporal rainfall approximation using additive models , 2000 .

[8]  Harri Niska,et al.  Methods for imputation of missing values in air quality data sets , 2004 .

[9]  Nikos Tsikriktsis,et al.  A review of techniques for treating missing data in OM survey research , 2005 .

[10]  B. Bhattacharya,et al.  NEURAL NETWORKS IN RECONSTRUCTING MISSING WAVE DATA IN SEDIMENTATION MODELLING , 2022 .

[11]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[12]  W. C. Lennox,et al.  Groups and neural networks based streamflow data infilling procedures , 2001 .

[13]  Holger R. Maier,et al.  Neural networks for the prediction and forecasting of water resource variables: a review of modelling issues and applications , 2000, Environ. Model. Softw..

[14]  Ronny Berndtsson,et al.  Interpolating monthly precipitation by self-organizing map (SOM) and multilayer perceptron (MLP) , 2007 .

[15]  Dimitri P. Solomatine,et al.  Application of adaptive fuzzy rule-based models for reconstruction of missing precipitation events , 2000 .

[16]  William W. Hsieh,et al.  Applying Neural Network Models to Prediction and Data Analysis in Meteorology and Oceanography. , 1998 .

[17]  Carol M Musil,et al.  A Comparison of Imputation Techniques for Handling Missing Data , 2002, Western journal of nursing research.

[18]  P. Allison Multiple Imputation for Missing Data , 2000 .

[19]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[20]  Robert J. Kuligowski,et al.  USING ARTIFICIAL NEURAL NETWORKS TO ESTIMATE MISSING RAINFALL DATA 1 , 1998 .

[21]  N. J. Ferreira,et al.  Artificial neural network technique for rainfall forecasting applied to the São Paulo region , 2005 .

[22]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[23]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[24]  T. Kohonen Analysis of a simple self-organizing process , 1982, Biological Cybernetics.

[25]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[26]  J. Frane Some simple procedures for handling missing data in multivariate analysis , 1976 .