Performance of Some Estimators of Relative Variability

The classic coefficient of variation (CV) is the ratio of the standard deviation to the mean and can be used to compare normally distributed data with respect to their variability, this measure has been widely used in many fields. In the Social Sciences, the CV is used to evaluate demographic heterogeneity and social aggregates such as race, sex, education and others. Data of this nature are usually not normally distributed, and the distributional characteristics can vary widely. In this sense, more accurate and robust estimator variations of the classic CV are needed to give a more realistic picture of the behaviour of collected data. In this work, we empirically evaluate five measures of relative variability, including the classic CV, of finite sample sizes via Monte Carlo simulations. Our purpose is to give an insight into the behaviour of these estimators, as their performance has not previously been systematically investigated. To represent different behaviours of the data, we considered some statistical distributions -- which are frequently used to model data across various research fields. To enable comparisons, we consider parameters of these distributions that lead to a similar range of values for the CV. Our results indicate that CV estimators based on robust statistics of scale and location are more accurate and give the highest measure of efficiency. Finally, we study the stability of a robust CV estimator in psychological and genetic data and compare the results with the traditional CV.

[1]  Achim Zeileis,et al.  Strucchange: An R package for testing for structural change in linear regression models , 2002 .

[2]  Francisco Herrera,et al.  A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability , 2009, Soft Comput..

[3]  D. Bickel Robust and efficient estimation of the mode of continuous data: the mode as a viable measure of central tendency , 2003 .

[4]  Thomas P. Hettmansperger,et al.  Robust Nonparametric Statistical Methods, Second Edition , 2010 .

[5]  Anton Abdulbasah Kamil,et al.  Inventory Management Systems with Hazardous Items of Two-Parameter Exponential Distribution , 2009 .

[6]  F. Lopera,et al.  Pre-dementia clinical stages in presenilin 1 E280A familial early-onset Alzheimer's disease: a retrospective cohort study , 2011, The Lancet Neurology.

[7]  Frank E. Grubbs,et al.  Approximate Fiducial Bounds on Reliability for the Two Parameter Negative Exponential Distribution , 1971 .

[8]  Claude E. Shannon,et al.  A mathematical theory of communication , 1948, MOCO.

[9]  M. Dawson,et al.  Fitting the ex-Gaussian equation to reaction time distributions , 1988 .

[10]  P. Rousseeuw,et al.  Alternatives to the Median Absolute Deviation , 1993 .

[11]  D. Weed Weight of Evidence: A Review of Concept and Methods , 2005, Risk analysis : an official publication of the Society for Risk Analysis.

[12]  Guido Bugmann,et al.  Coefficient of variation vs. mean interspike interval curves: What do they tell us about the brain? , 2001, Neurocomputing.

[13]  M. Muenke,et al.  Genetics of population isolates , 2002, Clinical genetics.

[14]  T. Liang Empirical Bayes estimation of coefficient of variation in shifted exponential distributions , 2009 .

[15]  Rahim Mahmoudvand,et al.  Two new confidence intervals for the coefficient of variation in a normal distribution , 2009 .

[16]  B. Zogheib,et al.  Confidence interval estimation for the population coefficient of variation using ranked set sampling: a simulation study , 2014 .

[17]  B. M. Brown,et al.  Symmetric quantile averages and related estimators , 1981 .

[18]  J. Tukey A survey of sampling from contaminated distributions , 1960 .

[19]  G. Chow Tests of equality between sets of coefficients in two linear regressions (econometrics voi 28 , 1960 .

[20]  Student,et al.  THE PROBABLE ERROR OF A MEAN , 1908 .

[21]  Gene A. Brewer Analyzing response time distributions: Methodological and theoretical suggestions for prospective memory researchers , 2011 .

[22]  J. H. Venter On Estimation of the Mode , 1967 .

[23]  Barry Gordon,et al.  The basis for choice reaction time slowing in Alzheimer's disease , 1990, Brain and Cognition.

[24]  B. M. Kibria,et al.  Testing the Population Coefficient of Variation , 2012 .

[25]  Luc Pronzato,et al.  Design of computer experiments: space filling and beyond , 2011, Statistics and Computing.

[26]  M. Kivipelto,et al.  Epidemiology of Alzheimer's disease: occurrence, determinants, and strategies toward intervention , 2009, Dialogues in clinical neuroscience.

[27]  Johannes Forkman,et al.  Estimator and Tests for Common Coefficients of Variation in Normal Distributions , 2009 .

[28]  L. Lin,et al.  A concordance correlation coefficient to evaluate reproducibility. , 1989, Biometrics.

[29]  Richard A. Groeneveld,et al.  Measuring Skewness and Kurtosis , 1984 .

[30]  M. Bryson Heavy-Tailed Distributions: Properties and Tests , 1974 .

[31]  Louis N. Gray,et al.  Measurement of Relative Variation: Sociological Examples , 1971 .

[32]  H. Scheffé A Statistical Theory of Calibration , 1973 .

[33]  M. Hubert,et al.  A Robust Measure of Skewness , 2004 .

[34]  Arthur G. Bedeian,et al.  On the Use of the Coefficient of Variation as a Measure of Diversity , 2000 .

[35]  M. Félix,et al.  Pervasive robustness in biological systems , 2015, Nature Reviews Genetics.

[36]  When the mean fails, use an M-estimator , 2012 .

[37]  Kenneth A. Bollen,et al.  Monte Carlo Experiments: Design and Implementation , 2001 .

[38]  D. W. Zimmerman,et al.  Invalidation of Parametric and Nonparametric Statistical Tests by Concurrent Violation of Two Assumptions , 1998 .

[39]  Céline Helbert,et al.  DiceDesign and DiceEval: Two R Packages for Design and Analysis of Computer Experiments , 2015 .

[40]  Ing Rj Ser Approximation Theorems of Mathematical Statistics , 1980 .

[41]  J. C. Correa,et al.  A New Method for Detecting Significant p-values with Applications to Genetic Data , 2014 .

[42]  J. C. Correa,et al.  Pooling/bootstrap-based GWAS (pbGWAS) identifies new loci modifying the age of onset in PSEN1 p.Glu280Ala Alzheimer's disease , 2012, Molecular Psychiatry.

[43]  Juan Carlos Correa,et al.  Should we think of a different Median estimator , 2014 .

[44]  Mark G. Vangel,et al.  Confidence Intervals for a Normal Coefficient of Variation , 1996 .

[45]  F. Rubio,et al.  On the existence of a normal approximation to the distribution of the ratio of two independent normal random variables , 2013 .

[46]  Yang Gao,et al.  Comparison of Splicing Factor 3b Inhibitors in Human Cells , 2013, Chembiochem : a European journal of chemical biology.

[47]  A. Wimo,et al.  The global prevalence of dementia: A systematic review and metaanalysis , 2013, Alzheimer's & Dementia.

[48]  Anwar Mohammadi,et al.  Estimator and Tests for Coefficient of Variation in Uniform Distribution , 2012 .

[49]  Robert W. Day,et al.  Comparisons of Treatments After an Analysis of Variance in Ecology , 1989 .

[50]  Emanuel Parzen,et al.  Quantile Probability and Statistical Data Modeling , 2004 .

[51]  The Use and Misuse of the Coefficient of Variation in Organizational Demography Research , 2002 .

[52]  N. L. Johnson Systems of frequency curves derived from the first law of Laplace , 1955 .

[53]  Douglas G. Bonett,et al.  Confidence Intervals for Mean Absolute Deviations , 2003 .

[54]  Odd O Aalen,et al.  Understanding variation in disease risk: the elusive concept of frailty , 2014, International journal of epidemiology.

[55]  Denis Cousineau,et al.  On the efficacy of procedures to normalize Ex-Gaussian distributions , 2015, Front. Psychol..

[56]  On teaching about the coefficient of variation in introductory statistics courses , 2014 .

[57]  Giovanni Celano,et al.  Monitoring the Coefficient of Variation Using a Variable Sampling Interval Control Chart , 2013, Qual. Reliab. Eng. Int..

[58]  J. Hulstijn,et al.  Automatization in second language acquisition: What does the coefficient of variation tell us? , 2009, Applied Psycholinguistics.

[59]  Christophe Ley,et al.  Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median , 2013 .

[60]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[62]  S S Stevens,et al.  On the Theory of Scales of Measurement. , 1946, Science.

[63]  D. Visvikis,et al.  Evaluation of respiratory and cardiac motion correction schemes in dual gated PET/CT cardiac imaging. , 2014, Medical physics.

[64]  Frank E. Harrell,et al.  A new distribution-free quantile estimator , 1982 .

[65]  W. Stahel,et al.  Log-normal Distributions across the Sciences: Keys and Clues , 2001 .

[66]  Rand Wilcox Chapter 10 – Robust Regression , 2012 .

[67]  B. McCullough,et al.  Regression analysis of variates observed on (0, 1): percentages, proportions and fractions , 2003 .

[68]  R. Mayeux,et al.  Epidemiology of Alzheimer disease , 2011, Nature Reviews Neurology.

[69]  A. T. McKay,et al.  Distribution of the Coefficient of Variation and the Extended “T” Distribution , 1932 .

[70]  R. Mayeux,et al.  Review - Part of the Special Issue: Alzheimer's Disease - Amyloid, Tau and Beyond Alzheimer disease: Epidemiology, diagnostic criteria, risk factors and biomarkers , 2014 .

[71]  I. Deary,et al.  Age and sex differences in reaction time in adulthood: results from the United Kingdom Health and Lifestyle Survey. , 2006, Psychology and aging.

[72]  K. Pearson Mathematical Contributions to the Theory of Evolution. III. Regression, Heredity, and Panmixia , 1896 .

[73]  R. E. Lund,et al.  Algorithm AS 190: Probabilities and Upper Quantiles for the Studentized Range , 1983 .

[74]  Jin Zhang A highly efficient L-estimator for the location parameter of the Cauchy distribution , 2010, Comput. Stat..

[75]  Sa-aat Niwitpong,et al.  Confidence intervals for the weighted coefficients of variation of two-parameter exponential distributions , 2017 .

[76]  Edith Seier,et al.  Confidence Interval for a Coefficient of Dispersion in Nonnormal Distributions , 2006, Biometrical journal. Biometrische Zeitschrift.

[77]  Ursula Gather,et al.  A note on Tyler's modification of the MAD for the Stahel-Donoho estimator , 1997 .

[78]  R. Luce,et al.  Evidence from auditory simple reaction times for both change and level detectors , 1982, Perception & psychophysics.

[79]  F. Marmolejo‐Ramos,et al.  A Power Comparison of Various Tests of Univariate Normality on Ex-Gaussian Distributions , 2013 .

[80]  Laura Bleiler,et al.  2014 Alzheimer's disease facts and figures , 2014, Alzheimer's & Dementia.

[81]  R. Zeigler Estimators of Coefficient of Variation Using k Samples , 1973 .

[82]  D. Faber,et al.  Applicability of the coefficient of variation method for analyzing synaptic plasticity. , 1991, Biophysical journal.

[83]  Jeremy M Wolfe,et al.  What are the shapes of response time distributions in visual search? , 2011, Journal of experimental psychology. Human perception and performance.

[84]  Ken Kelley,et al.  Estimation of the Coefficient of Variation with Minimum Risk: A Sequential Method for Minimizing Sampling Error and Study Cost , 2016, Multivariate behavioral research.

[85]  J. C. Correa,et al.  A new approach to the Box–Cox transformation , 2015, Front. Appl. Math. Stat..

[86]  L. Jarvik,et al.  About a peculiar disease of the cerebral cortex. By Alois Alzheimer, 1907 (Translated by L. Jarvik and H. Greenson) , 1986, Alzheimer disease and associated disorders.

[87]  B. Efron Bootstrap Methods: Another Look at the Jackknife , 1979 .

[88]  Douglas G. Bonett,et al.  Confidence interval for a coefficient of quartile variation , 2006, Comput. Stat. Data Anal..

[89]  Wolfgang Reh,et al.  Significance tests and confidence intervals for coefficients of variation , 1996 .

[90]  Werner Hürlimann,et al.  A uniform approximation to the sampling distribution of the coefficient of variation , 1995 .

[91]  C Leth-Steensen,et al.  Mean response times, variability, and skew in the responding of ADHD children: a response time distributional approach. , 2000, Acta psychologica.

[92]  R. Hartley Transmission of information , 1928 .

[93]  R. K. Kohrding A Test of Equality of Two Normal Population Means Assuming Homogeneous Coefficients of Variation , 1969 .

[94]  Steve P. Verrill,et al.  The distribution of McKay's approximation for the coefficient of variation , 2008 .

[95]  J. Licinio,et al.  APOE*E2 allele delays age of onset in PSEN1 E280A Alzheimer's disease , 2015, Molecular Psychiatry.

[96]  X. R. Li,et al.  Measures of performance for evaluation of estimators and filters , 2001 .

[97]  R. A. Brown,et al.  Robustness of the studentized range statistic , 1974 .

[98]  A. Ruiz-Linares,et al.  Clinical features of early-onset Alzheimer disease in a large kindred with an E280A presenilin-1 mutation. , 1997, JAMA.

[99]  J. Michell Measurement scales and statistics: A clash of paradigms. , 1986 .

[100]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[101]  Rob J Hyndman,et al.  Sample Quantiles in Statistical Packages , 1996 .

[102]  Leonard R. Sussman,et al.  Nominal, Ordinal, Interval, and Ratio Typologies are Misleading , 1993 .

[103]  Rand R. Wilcox,et al.  Trimming and Winsorization , 2005 .

[104]  Yoshihiko Maesono,et al.  ASYMPTOTIC REPRESENTATION OF RATIO STATISTICS AND THEIR MEAN SQUARED ERRORS , 2005 .

[105]  J. I. Vélez,et al.  Erratum to: Automatic detection of discordant outliers via the Ueda’s method , 2015 .