Estimating information entropy for hydrological data: One‐dimensional case

There has been a recent resurgence of interest in the application of Information Theory to problems of system identification in the Earth and Environmental Sciences. While the concept of entropy has found increased application, little attention has yet been given to the practical problems of estimating entropy when dealing with the unique characteristics of two commonly used kinds of hydrologic data: rainfall and runoff. In this paper, we discuss four important issues of practical relevance that can bias the computation of entropy if not properly handled. The first (zero effect) arises when precipitation and ephemeral streamflow data must be viewed as arising from a discrete-continuous hybrid distribution due to the occurrence of many zero values (e.g., days with no rain/no runoff). Second, in the widely used bin-counting method for estimation of PDF's, significant error can be introduced if the bin width is not carefully selected. The third (measurement effect) arises due to the fact that continuously varying hydrologic variables can typically only be observed discretely to some degree of precision. The Fourth (skewness effect) arises when the distribution of a variable is significantly skewed. Here we present an approach that can deal with all four of these issues, and test them with artificially generated and real hydrological data. The results indicate that the method is accurate and robust.

[1]  M. G. Anderson Encyclopedia of hydrological sciences. , 2005 .

[2]  Upmanu Lall,et al.  Seasonal to interannual rainfall probabilistic forecasts for improved water supply management : Part 2 - Predictor identification of quarterly rainfall using ocean-atmosphere information , 2000 .

[3]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[4]  Praveen Kumar,et al.  Ecohydrologic process networks: 1. Identification , 2009 .

[5]  D. W. Scott Averaged Shifted Histograms: Effective Nonparametric Density Estimators in Several Dimensions , 1985 .

[6]  Vijay P. Singh,et al.  A bivariate mixed distribution with a heavy‐tailed component and its application to single‐site daily rainfall simulation , 2013 .

[7]  Sheldon M. Ross Introduction to Probability Models. , 1995 .

[8]  Ashish Sharma,et al.  Seasonal to interannual rainfall probabilistic forecasts for improved water supply management: Part 3 — A nonparametric probabilistic forecast model , 2000 .

[9]  Boualem Boashash,et al.  The bootstrap and its application in signal processing , 1998, IEEE Signal Process. Mag..

[10]  V. Singh,et al.  The entropy theory as a tool for modelling and decision-making in environmental and water resources. , 2000 .

[11]  S. Sorooshian,et al.  Stochastic parameter estimation procedures for hydrologie rainfall‐runoff models: Correlated and heteroscedastic error cases , 1980 .

[12]  S. Sorooshian,et al.  A Shuffled Complex Evolution Metropolis algorithm for optimization and uncertainty assessment of hydrologic model parameters , 2002 .

[13]  Vijay P. Singh,et al.  Entropy Theory and its Application in Environmental and Water Engineering: Singh/Entropy Theory and its Application in Environmental and Water Engineering , 2013 .

[14]  Vijay P. Singh,et al.  Simulation of the entire range of daily precipitation using a hybrid probability distribution , 2012 .

[15]  Claude E. Shannon,et al.  The mathematical theory of communication , 1950 .

[16]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[17]  Hoshin Vijai Gupta,et al.  Using an informational entropy-based metric as a diagnostic of flow duration to drive model parameter identification , 2012 .

[18]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[19]  Alfred O. Hero,et al.  Estimating epistemic and aleatory uncertainties during hydrologic modeling: An information theoretic approach , 2013 .

[20]  Praveen Kumar,et al.  Ecohydrologic process networks: 2. Analysis and characterization , 2009 .

[21]  S. Weijs,et al.  Accounting for Observational Uncertainty in Forecast Verification: An Information-Theoretical View on Forecasts, Observations, and Truth , 2011 .

[22]  Ashish Sharma,et al.  Seasonal to interannual rainfall probabilistic forecasts for improved water supply management: Part 1 — A strategy for system predictor identification , 2000 .

[23]  Hoshin Vijai Gupta,et al.  How Bayesian data assimilation can be used to estimate the mathematical structure of a model , 2010 .

[24]  Soroosh Sorooshian,et al.  Calibration of rainfall‐runoff models: Application of global optimization to the Sacramento Soil Moisture Accounting Model , 1993 .

[25]  Nick van de Giesen,et al.  Kullback–Leibler Divergence as a Forecast Skill Score with Classic Reliability–Resolution–Uncertainty Decomposition , 2010 .

[26]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[27]  V. Singh,et al.  THE USE OF ENTROPY IN HYDROLOGY AND WATER RESOURCES , 1997 .

[28]  M. Trosset,et al.  Bayesian recursive parameter estimation for hydrologic models , 2001 .

[29]  B. Efron Bootstrap Methods: Another Look at the Jackknife , 1979 .

[30]  Holger R. Maier,et al.  Selection of input variables for data driven models: An average shifted histogram partial mutual information estimator approach , 2009 .

[31]  D. W. Scott On optimal and data based histograms , 1979 .

[32]  Soroosh Sorooshian,et al.  Bayesian Recursive Estimation of Parameter and Output Uncertainty for Watershed Models , 2013 .

[33]  S. Weijs,et al.  Why hydrological predictions should be evaluated using information theory , 2010 .

[34]  D. Cox,et al.  An Analysis of Transformations Revisited, Rebutted , 1982 .

[35]  Yuichi Mori,et al.  Handbook of computational statistics : concepts and methods , 2004 .

[36]  J. Vrugt,et al.  A formal likelihood function for parameter and predictive inference of hydrologic models with correlated, heteroscedastic, and non‐Gaussian errors , 2010 .

[37]  Tom G. Chapman,et al.  Entropy as a measure of hydrologic data uncertainty and model performance , 1986 .

[38]  Holger R. Maier,et al.  Non-linear variable selection for artificial neural networks using partial mutual information , 2008, Environ. Model. Softw..