Comparison of the multiple imputation approaches for imputing rainfall data series and their applications to watershed models

Abstract Rainfall data scarcity has caused enormous problems in hydrologic and non-point pollution (H/NPS) predictions, as rainfall data represent the key input to watershed models. In this study, the effects of different imputation methods such as the data augmentation (DA) and the expectation maximization with bootstrap (EMB) algorithms on rainfall data scarcity were compared. The effects of different data scarcity rates and periods on model performance and prediction uncertainty were then quantified. Finally, the effects of different imputed data sets on H/NPS results were evaluated with the soil and water assessment tool (SWAT). A real case study in the Daning River watershed, Three Gorges Reservoir Region, China, was evaluated. The results indicated that rainfall data scarcity during low flow periods would result in poorer model performance and larger prediction uncertainty, especially to some minimum values, and the time when the maximum values are more susceptible is the rainfall data scarcity during high flow periods. The repair of rainfall data and the H/NPS model performance obtained by the EMB algorithm are superior to the traditional DA and weather generator performances. This advantage of the EMB algorithm would be more definitive if a specific threshold of data scarcity is reached. It is noted that even if the best algorithm is used, the imputed value is always lower than the peak observed value. This paper reports important implications for the choice of imputation methods and the use of H/NPS models for solving data scarcity problems for watershed studies.

[1]  Chong-Yu Xu,et al.  Suitability of the TRMM satellite rainfalls in driving a distributed hydrological model for water balance computations in Xinjiang catchment, Poyang lake basin , 2012 .

[2]  Victor Alchanatis,et al.  FRUIT VISIBILITY ANALYSIS FOR ROBOTIC CITRUS HARVESTING , 2009 .

[3]  Ross Sparks,et al.  Patching rainfall data using regression methods. 2. Comparisons of accuracy, bias and efficiency , 1997 .

[4]  C. Harpham,et al.  A daily weather generator for use in climate change studies , 2007, Environ. Model. Softw..

[5]  Xiang Gao,et al.  Effects of soil moisture content on upland nitrogen loss , 2017 .

[6]  Lei Chen,et al.  Event-based nonpoint source pollution prediction in a scarce data catchment , 2017 .

[7]  Ross Sparks,et al.  Patching rainfall data using regression methods. , 1997 .

[8]  N. J. Ferreira,et al.  Artificial neural network technique for rainfall forecasting applied to the São Paulo region , 2005 .

[9]  Evaluation of imputation methods for microbial surface water quality studies. , 2014, Environmental science. Processes & impacts.

[10]  Zhenyao Shen,et al.  Structural uncertainty in watershed phosphorus modeling: Toward a stochastic framework , 2016 .

[11]  Gary King,et al.  Amelia II: A Program for Missing Data , 2011 .

[12]  Masayoshi Takahashi,et al.  Multiple Ratio Imputation by the EMB Algorithm: Theory and Simulation , 2017 .

[13]  Sirish L. Shah,et al.  Treatment of missing values in process data analysis , 2008 .

[14]  Qin Huang,et al.  Identifying non-point source priority management areas in watersheds with multiple functional zones. , 2015, Water research.

[15]  A. Acock Working With Missing Values , 2005 .

[16]  David D. Bosch,et al.  Effect of spatial distribution of rainfall on temporal and spatial uncertainty of SWAT output. , 2009 .

[17]  Ting Hsiang Lin,et al.  A comparison of multiple imputation with EM algorithm and MCMC method for quality of life missing data , 2010 .

[18]  Lei Chen,et al.  Analysis of parameter uncertainty in hydrological and sediment modeling using GLUE method: a case study of SWAT model applied to Three Gorges Reservoir Region, China , 2011 .

[19]  Pei Zhao,et al.  Bayesian framework of parameter sensitivity, uncertainty, and identifiability analysis in complex water quality models , 2018, Environ. Model. Softw..

[20]  Martyn P. Clark,et al.  HESS Opinions: The complementary merits of top-down and bottom-up modelling philosophies in hydrology , 2017 .

[21]  Vicente Caselles,et al.  Multiple imputation of rainfall missing data in the Iberian Mediterranean context , 2017 .

[22]  Lei Chen,et al.  An Interval-Deviation Approach for hydrology and water quality model evaluation within an uncertainty framework , 2014 .

[23]  Ruimin Liu,et al.  Spatial-temporal characteristics of phosphorus in non-point source pollution with grid-based export coefficient model and geographical information system. , 2015, Water science and technology : a journal of the International Association on Water Pollution Research.

[24]  Adilah Abdul Ghapor,et al.  Missing Value Estimation Methods for Data in Linear Functional Relationship Model , 2017 .

[25]  K. Huntington,et al.  Influence of vegetation type and site-to-site variability on soil carbonate clumped isotope records, Andean piedmont of Central Argentina (32–34°S) , 2016 .

[26]  J. Nash,et al.  River flow forecasting through conceptual models part I — A discussion of principles☆ , 1970 .

[27]  John R. Williams,et al.  LARGE AREA HYDROLOGIC MODELING AND ASSESSMENT PART I: MODEL DEVELOPMENT 1 , 1998 .

[28]  K. Beven,et al.  A physically based, variable contributing area model of basin hydrology , 1979 .

[29]  Lei Chen,et al.  Evaluating the impacts of soil data on hydrological and nonpoint source pollution prediction. , 2016, The Science of the total environment.

[30]  Cheng Sun,et al.  Influence of rainfall data scarcity on non-point source pollution prediction: Implications for physically based models , 2018, Journal of Hydrology.

[31]  Ruimin Liu,et al.  Impacts of manure application on SWAT model outputs in the Xiangxi River watershed , 2017 .

[32]  Peder Hjorth,et al.  Imputation of missing values in a precipitation–runoff process database , 2009 .

[33]  Mohammad Ali Ghorbani,et al.  Integration of Volterra model with artificial neural networks for rainfall-runoff simulation in forested catchment of northern Iran , 2016 .

[34]  T. Stijnen,et al.  Review: a gentle introduction to imputation of missing values. , 2006, Journal of clinical epidemiology.

[35]  Harri Niska,et al.  Methods for imputation of missing values in air quality data sets , 2004 .

[36]  Lei Chen,et al.  Impact of spatial rainfall variability on hydrology and nonpoint source pollution modeling , 2012 .

[37]  Keith Beven,et al.  The future of distributed models: model calibration and uncertainty prediction. , 1992 .

[38]  M. Martínez-Mena,et al.  The role of antecedent soil water content in the runoff response of semiarid catchments: a simulation approach , 2003 .

[39]  Paulin Coulibaly,et al.  Comparison of neural network methods for infilling missing daily weather records , 2007 .

[40]  Keith Beven,et al.  Equifinality, data assimilation, and uncertainty estimation in mechanistic modelling of complex environmental systems using the GLUE methodology , 2001 .

[41]  Lei Chen,et al.  Improvement of model evaluation by incorporating prediction and measurement uncertainty , 2017 .

[42]  Alan Olinsky,et al.  The comparative efficacy of imputation methods for missing data in structural equation modeling , 2003, Eur. J. Oper. Res..

[43]  T. Hogg,et al.  Multiple imputation and maximum likelihood principal component analysis of incomplete multivariate data from a study of the ageing of port , 2001 .

[44]  Stephanie T. Lanza,et al.  Using data augmentation to obtain standard errors and conduct hypothesis tests in latent class and latent transition analysis. , 2005, Psychological methods.

[45]  Z. Y. Shen Analysis of parameter uncertainty in hydrological modeling using GLUE method : a case study of SWAT model applied to Three Gorges Reservoir Region , China , 2011 .

[46]  Geoffrey G. S. Pegram Patching rainfall data using regression methods. 3. Grouping, patching and outlier detection , 1997 .

[47]  Sinan Sahin,et al.  Homogeneity analysis of Turkish meteorological data set , 2010 .

[48]  Steven T. Bednarz,et al.  LARGE AREA HYDROLOGIC MODELING AND ASSESSMENT PART II: MODEL APPLICATION 1 , 1998 .

[49]  Masayoshi Takahashi JMASM44: Implementing Multiple Ratio Imputation by the EMB Algorithm (R) , 2017 .

[50]  Jeffrey G. Arnold,et al.  Soil and Water Assessment Tool (SWAT) Model: Current Developments and Applications , 2010 .

[51]  Mourad Zribi,et al.  On the EM Algorithm and Bootstrap Approach Combination for Improving Satellite Image Fusion , 2008 .

[52]  Indrajeet Chaubey,et al.  QUANTIFYING MODEL OUTPUT UNCERTAINTY DUE TO SPATIAL VARIABILITY OF RAINFALL 1 , 1999 .

[53]  J. Ries,et al.  Impact of severe rain storms on soil erosion: Experimental evaluation of wind-driven rain and its implications for natural hazard management. , 2017, The Science of the total environment.

[54]  Xu-dong Fu,et al.  A bootstrap method to estimate the influence of rainfall spatial uncertainty in hydrological simulations , 2017 .

[55]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[56]  D. Legates,et al.  Evaluating the use of “goodness‐of‐fit” Measures in hydrologic and hydroclimatic model validation , 1999 .

[57]  Haifeng Jia,et al.  A Bayesian approach for evaluation of the effect of water quality model parameter uncertainty on TMDLs: A case study of Miyun Reservoir. , 2016, The Science of the total environment.

[58]  Lei Chen,et al.  Development of an integrated modeling approach for identifying multilevel non‐point‐source priority management areas at the watershed scale , 2014 .

[59]  Venkat Lakshmi,et al.  The role of satellite remote sensing in the Prediction of Ungauged Basins , 2004 .