Uncertainty inherent in empirical fitting of distributions to experimental data

Treatment of experimental data often entails fitting frequency functions, in order to draw inferences on the population underlying the sample at hand, and/or identify plausible mechanistic models. Several families of functions are currently resorted to, providing a broad range of forms; an overview is given in the light of historical developments, and some issues in identification and fitting procedure are considered. But for the case of fairly large, well behaved data sets, empirical identification of underlying distribution among a number of plausible candidates may turn out to be somehow arbitrary, entailing a substantial uncertainty component. A pragmatic approach to estimation of an approximate confidence region is proposed, based upon identification of a representative subset of distributions marginally compatible at a given level with the data at hand. A comprehensive confidence region is defined by the envelope of the subset of distributions considered, and indications are given to allow first order estimation of uncertainty component inherent in empirical distribution fitting.

[1]  Richard A. Johnson Miller & Freund's Probability and Statistics for Engineers , 1993 .

[2]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[3]  N. L. Johnson,et al.  Table of percentage points of Pearson curves, for given √β1 and β2, expressed in standard measure , 1963 .

[4]  J. Bukač,et al.  Fitting SB curves using symmetrical percentile points , 1972 .

[5]  A. Quételet,et al.  Du système social et des lois qui le régissent , 1848 .

[6]  F. Y. Edgeworth I.— on the Representation of Statistics by Mathematical Formula). (Part I.) , 1898 .

[7]  Francis Galton,et al.  XII. The geometric mean, in vital and social statistics , 1879, Proceedings of the Royal Society of London.

[8]  S. Lohr Statistics (2nd Ed.) , 1994 .

[9]  O. Podladchikova,et al.  Classification of probability densities on the basis of Pearson’s curves with application to coronal heating simulations , 2003 .

[10]  D. Mcalister,et al.  XIII. The law of the geometric mean , 1879, Proceedings of the Royal Society of London.

[11]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[12]  A. Rhind TABLES TO FACILITATE THE COMPUTATION OF THE PROBABLE ERRORS OF THE CHIEF CONSTANTS OF SKEW FREQUENCY DISTRIBUTIONS , 1909 .

[13]  J. K. Ord,et al.  Families of frequency distributions , 1973 .

[14]  W. Stahel,et al.  Log-normal Distributions across the Sciences: Keys and Clues , 2001 .

[15]  T. Porter The Rise of Statistical Thinking, 1820-1900 , 2020 .

[16]  I. D. Hill,et al.  Fitting Johnson Curves by Moments , 1976 .

[17]  Dallas R. Wingo,et al.  Maximum likelihood methods for fitting the burr type XII distribution to multiply (progressively) censored life test data , 1993 .

[18]  J. Fenn,et al.  A Conversation with , 2009 .

[19]  J. Tukey The Future of Data Analysis , 1962 .

[20]  Gianfranco Genta,et al.  Features and performance of some outlier detection methods , 2011 .

[21]  Paul J. Zsombor-Murray,et al.  Direct and specific least-square fitting of hyperbolæ and ellipses , 2004, J. Electronic Imaging.

[22]  A. Tarsitano FITTING THE GENERALIZED LAMBDA DISTRIBUTION TO INCOME DATA , 2004 .

[23]  Grazia Vicario,et al.  Treatment of Experimental Data with Discordant Observations: Issues in Empirical Identification of Distribution , 2012 .

[24]  James R. Wilson,et al.  Visual interactive fitting of bounded Johnson distributions , 1989, Simul..

[25]  I. W. Burr Cumulative Frequency Functions , 1942 .

[26]  Olivier Faron Adolphe Quetelet, Physique sociale ou essai sur le développement des facultés de l'homme [1869]. 1997 , 1998 .

[27]  Walter Frank Raphael Weldon,et al.  II. On certain correlated variations in Carcinus mænas , 2022, Proceedings of the Royal Society of London.

[29]  B. Silverman,et al.  Using Kernel Density Estimates to Investigate Multimodality , 1981 .

[30]  Robert Schmitt,et al.  Bootstrap approach for conformance assessment of measurement processes , 2011 .

[31]  N. Cox Statistical Models in Engineering , 1970 .

[32]  H. A. David,et al.  Order Statistics (2nd ed). , 1981 .

[33]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[34]  J. Filliben The Probability Plot Correlation Coefficient Test for Normality , 1975 .

[35]  F. Bookstein Fitting conic sections to scattered data , 1979 .

[36]  Robert N. Rodriguez A guide to the Burr type XII distributions , 1977 .

[37]  M. Degroot A Conversation with George Box , 1987 .

[38]  N. L. Johnson,et al.  Systems of frequency curves generated by methods of translation. , 1949, Biometrika.

[39]  J. J. Swain,et al.  Least-squares estimation of distribution functions in johnson's translation system , 1988 .

[40]  Karl Pearson,et al.  Mathematical contributions to the theory of evolution.—X. Supplement to a memoir on skew variation , 1901, Proceedings of the Royal Society of London.

[41]  K. Pearson Contributions to the Mathematical Theory of Evolution , 1894 .

[42]  W. R. Buckland,et al.  Theory and Technique of Variation Research. , 1965 .

[43]  W. Elderton,et al.  Frequency Curves and Correlation , 1907, Nature.

[44]  S. Shapiro,et al.  THE JOHNSON SYSTEM: SELECTION AND PARAMETER ESTIMATION , 1980 .

[45]  Gianfranco Genta Methods for Uncertainty Evaluation in Measurement , 2010 .

[46]  I. D. Hill Algorithm AS 100: Normal-Johnson and Johnson-Normal Transformations , 1976 .

[47]  Gorana Baršić,et al.  METHODS FOR UNCERTAINTY EVALUATION , 2005 .

[48]  Surajit Pal Evaluation of Nonnormal Process Capability Indices using Generalized Lambda Distribution , 2004 .

[49]  David M. Miller,et al.  Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables (National Bureau of Standards Applied Mathematics Series No. 55) , 1965 .

[50]  Bruce W. Schmeiser,et al.  An approximate method for generating symmetric random variables , 1972, CACM.

[51]  Irene A. Stegun,et al.  Handbook of Mathematical Functions. , 1966 .

[52]  David T. Mage,et al.  An Explicit Solution for SB, Parameters Using Four Percentile Points , 1980 .

[53]  Robert E. Wheeler,et al.  Quantile estimators of Johnson curve parameters , 1980 .