An unbiased estimator of coefficient of variation of streamflow

Abstract Given increasing demand for high frequency streamflow series (HFSS) at daily and subdaily time scales there is increasing need for reliable metrics of relative variability for such series. HFSS can exhibit enormous relative variability especially in comparison with low frequency streamflow series formed by aggregation of HFSS. The product moment estimator of the coefficient of variation C, defined as the ratio of sample standard deviation to sample mean, as well as ten other common estimators of C, are shown to provide severely downward biased and highly variable estimates of C for very long records of highly skewed and periodic HFSS particularly for rivers which exhibit zeros. Resorting to the theory of compound distributions, we introduce an estimator of C corresponding to a mixture of monthly zero-inflated lognormal distributions denoted as a delta lognormal monthly mixture Δ LN3MM model. Through monthly stratification, our Δ LN3MM model accounts for the seasonality, skewness, multimodality, and the possible intermittency of HFSS. In comparisons among estimators, our Δ LN3MM based C estimator is shown to yield much more reliable and approximately unbiased estimates of C not only for small samples but also for very large samples (tens of thousands of observations). We document values of C in the range of [0.18, 42,000] with a median of 1.9 and an interquartile range of [1.34, 3.75] for 6807 daily streamflow series across the U.S. from GAGES-II dataset, with the highest values of C occurring in arid and semiarid regions. A multivariate analysis and national contour map reveal that extremely large values of C, never previously documented, tend to occur in arid watersheds with low runoff ratios, which tend to also exhibit a considerable number of zero streamflows.

[1]  J. Meisinger,et al.  Evaluation of statistical estimation methods for lognormally distributed variables , 1988 .

[2]  Ken Kelley,et al.  Sample size planning for the coefficient of variation from the accuracy in parameter estimation approach , 2007, Behavior research methods.

[3]  Shipra Banik,et al.  Estimating the Population Coefficient of Variation by Confidence Intervals , 2011, Commun. Stat. Simul. Comput..

[4]  W. Szulczewski,et al.  The Application of Mixture Distribution for the Estimation of Extreme Floods in Controlled Catchment Basins , 2018, Water Resources Management.

[5]  J. C. Houghton Birth of a parent: The Wakeby Distribution for modeling flood flows , 1978 .

[6]  R. Vogel,et al.  L moment diagrams should replace product moment diagrams , 1993 .

[7]  K. Pearson Mathematical Contributions to the Theory of Evolution. III. Regression, Heredity, and Panmixia , 1896 .

[8]  Stephen E. Satchell,et al.  A Re‐Examination of Sharpe's Ratio for Log‐Normal Prices , 2005 .

[9]  J. Stedinger,et al.  Analytical Estimation of Geomorphic Discharge Indices for Small Intermittent Streams , 2016 .

[10]  E. Crow,et al.  Lognormal Distributions: Theory and Applications , 1987 .

[11]  R. Vogel,et al.  Improved Estimators of Model Performance Efficiency for Skewed Hydrologic Data , 2020, Water Resources Research.

[12]  J. Stedinger,et al.  Discharge indices for water quality loads , 2003 .

[13]  J. Stedinger Frequency analysis of extreme events , 1993 .

[14]  Chin-Diew Lai,et al.  Robustness of the sample correlation - the bivariate lognormal case , 1999, Adv. Decis. Sci..

[15]  Benjamin Kedem,et al.  Estimation of mean rain rate: Application to satellite observations , 1990 .

[16]  Richard M. Vogel,et al.  On the probability distribution of daily streamflow in the United States , 2017 .

[17]  K. Subrahmanya Nairy,et al.  Tests of Coefficients of Variation of Normal Population , 2003 .

[18]  Ahmed A. Soliman,et al.  A simulation-based approach to the study of coefficient of variation of Gompertz distribution under progressive first-failure censoring , 2011 .

[19]  J. R. Wallis,et al.  Just a Moment , 2013 .

[20]  J. Prairie *Stochastic nonparametric framework for basin wide streamflow and salinity modeling: Application for the Colorado River Basin , 2006 .

[21]  D. J. Finney On the Distribution of a Variate Whose Logarithm is Normally Distributed , 1941 .

[22]  Leonard R. Sussman,et al.  Nominal, Ordinal, Interval, and Ratio Typologies are Misleading , 1993 .

[23]  Elena Volpi,et al.  Just two moments! A cautionary note against use of high-order moments in multifractal models in hydrology , 2013 .

[24]  A. Porporato,et al.  Superstatistics of hydro‐climatic fluctuations and interannual ecosystem productivity , 2006 .

[25]  Saeid Amiri,et al.  Assessing the coefficient of variations of chemical data using bootstrap method , 2011 .

[26]  R. Vogel,et al.  Improved estimators of correlation and R2 for skewed hydrologic data , 2020, Hydrological Sciences Journal.

[27]  W. Kirby,et al.  Algebraic boundedness of sample statistics , 1974 .

[28]  V. T. Chow,et al.  The Log-Probability Law and Its Engineering Applications , 1954 .

[29]  S. E. Rantz,et al.  Measurement and computation of streamflow , 1982 .

[30]  J. Aitchison On the Distribution of a Positive Random Variable Having a Discrete Probability Mass at the Origin , 1955 .

[31]  Kunio Shimizu,et al.  A Bivariate Mixed Lognormal Distribution with an Analysis of Rainfall Data , 1993 .

[32]  D. B. Owen,et al.  Confidence intervals for the coefficient of variation for the normal and log normal distributions , 1964 .

[33]  J. R. Wallis,et al.  Regional Frequency Analysis: An Approach Based on L-Moments , 1997 .

[34]  Richard M. Vogel,et al.  ESTIMATION OF HARMONIC MEAN OF A LOGNORMAL VARIABLE , 2000 .

[35]  N. L. Johnson,et al.  APPLICATIONS OF THE NON-CENTRAL t-DISTRIBUTION , 1940 .

[36]  P. Claps,et al.  Effects of disregarding seasonality on the distribution of hydrological extremes , 2011 .

[37]  Tiesong Hu,et al.  Frequency analysis of nonstationary annual maximum flood series using the time‐varying two‐component mixture distributions , 2017 .

[38]  A. T. McKay,et al.  Distribution of the Coefficient of Variation and the Extended “T” Distribution , 1932 .

[39]  Jonathan R. M. Hosking,et al.  The four-parameter kappa distribution , 1994, IBM J. Res. Dev..

[40]  Marvin D. Troutt,et al.  A simulation-based approach to the study of coefficient of variation of dividend yields , 2008, Eur. J. Oper. Res..

[41]  Robert Breunig,et al.  An almost unbiased estimator of the coefficient of variation , 2001 .

[42]  James A. Falcone,et al.  GAGES-II: Geospatial Attributes of Gages for Evaluating Streamflow , 2011 .

[43]  Wei Liu,et al.  On interval estimation of the coefficient of variation for the three-parameter Weibull, lognormal and gamma distribution: A simulation-based approach , 2005, Eur. J. Oper. Res..

[44]  Ahmed A. Soliman,et al.  Estimation of the coefficient of variation for non-normal model using progressive first-failure-censoring data , 2012 .

[45]  Walter A. Hendricks,et al.  The Sampling Distribution of the Coefficient of Variation , 1936 .

[46]  Upmanu Lall,et al.  Seasonality of streamflow: The Upper Mississippi River , 1999 .

[47]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[48]  Murugesu Sivapalan,et al.  Runoff Prediction in Ungauged Basins: Prediction of flow duration curves in ungauged basins , 2013 .

[49]  Richard M. Vogel,et al.  The regional persistence and variability of annual streamflow in the United States , 1998 .

[50]  R. P. Pandey,et al.  CHARACTERIZATION OF DROUGHT ACROSS CLIMATIC SPECTRUM , 2000 .

[51]  L. Leemis,et al.  Moment-Ratio Diagrams for Univariate Distributions , 2010 .

[52]  Eric Gilleland,et al.  Stochastic simulation of streamflow and spatial extremes: a continuous, wavelet-based approach , 2020 .

[53]  K. Koehler,et al.  Goodness-of-fit tests based on P—P probability plots , 1990 .

[54]  J. Hosking L‐Moments: Analysis and Estimation of Distributions Using Linear Combinations of Order Statistics , 1990 .

[55]  Sataya D. Dubey,et al.  Compound gamma, beta and F distributions , 1970 .

[56]  Jery R. Stedinger,et al.  Fitting log normal distributions to hydrologic data , 1980 .