Application of statistical methods for the comparison of data distributions

Data analysis is an essential section of all physics experiments; in spite of this only a few analysis standard toolkits are available. Concerning the comparison between distributions, almost all these toolkits are limited to the Chi-squared test. Statistics provides a whole chapter of Goodness-of-Fit tests, from the Chi-squared to tests based on maximum distance (Kolmogorov-Smirnov, Kuiper, Goodman), to tests based on quadratic distance (Fisz-Cramer-von Mises, Anderson-Darling, Tiku). All of these Goodness-of-Fit tests have been collected in a new open-source Statistical Toolkit. This Toolkit matches a sophisticated statistical data treatment with the most advanced computing techniques, such as object-oriented technology with the use of design patterns and generic programming. None of the Goodness-of-Fit tests included in the system is optimum for every case. Unfortunately, statistics does not provide a universal recipe for specific distributions and furthermore the only rare available guidelines refer to the comparison between smooth theoretical distributions. With the aim of helping the user in the algorithm choice, we present the results of an intrinsic statistical comparison among many of the Goodness-of-Fit tests contained in the Statistical Toolkit in terms of relative efficiency.

[1]  M. A. Stephens,et al.  Introduction to Kolmogorov (1933) On the Empirical Determination of a Distribution , 1992 .

[2]  F. Rademakers,et al.  ROOT — An object oriented data analysis framework , 1997 .

[3]  AN Kolmogorov-Smirnov,et al.  Sulla determinazione empírica di uma legge di distribuzione , 1933 .

[4]  R. Hogg Adaptive Robust Procedures: A Partial Review and Some Suggestions for Future Applications and Theory , 1974 .

[5]  O. Couet,et al.  Anaphe { OO Libraries and Tools for Data Analysis , 2001 .

[6]  Herbert Büning Robuste und adaptive Tests , 1991 .

[7]  M. Fisz On a Result by M. Rosenblatt Concerning the Von Mises-Smirnov Test , 1960 .

[8]  H. Cramér On the composition of elementary errors: Second paper: Statistical applications , 1928 .

[9]  W. J. Conover,et al.  Practical Nonparametric Statistics , 1972 .

[10]  N. Smirnov Table for Estimating the Goodness of Fit of Empirical Distributions , 1948 .

[11]  Richard Von Mises,et al.  Wahrscheinlichkeitsrechnung und ihre Anwendung in der Statistik und theoretischen Physik , 1931 .

[12]  Robert V. Hogg,et al.  A Two-Sample Adaptive Distribution-Free Test , 1975 .

[13]  G. Cirrone,et al.  ISTITUTO NAZIONALE DI FISICA NUCLEARE Sezione di Genova INFN / AE-04 / 08 21 Giugno 2004 A GOODNESS-OFFIT STATISTICAL TOOLKIT , 2004 .

[14]  B. S. Duran A survey of nonparametric tests for scale , 1976 .

[15]  Guy Barrand,et al.  Abstract Interfaces for Data Analysis - Component Architecture for Data Analysis Tools , 2002 .

[16]  D. Darling,et al.  A Test of Goodness of Fit , 1954 .

[17]  A. Martin-Löf On the composition of elementary errors , 1994 .

[18]  T. W. Anderson,et al.  Asymptotic Theory of Certain "Goodness of Fit" Criteria Based on Stochastic Processes , 1952 .

[19]  T. W. Anderson On the Distribution of the Two-Sample Cramer-von Mises Criterion , 1962 .

[20]  L. A. Goodman,et al.  Kolmogorov-Smirnov tests for psychological research. , 1954, Psychological bulletin.

[21]  O. Couet,et al.  PAW — Towards a physics analysis workstation , 1987 .

[22]  M. F. Fuller,et al.  Practical Nonparametric Statistics; Nonparametric Statistical Inference , 1973 .

[23]  D. Darling The Kolmogorov-Smirnov, Cramer-von Mises Tests , 1957 .

[24]  M. Pia,et al.  A goodness-of-fit statistical toolkit , 2004, IEEE Transactions on Nuclear Science.