Data analysis in HEP a statistical toolkit

Statistical methods play a significant role throughout the life-cycle of HEP experiments, being an essential component of physics analysis. Only a few basic tools for statistical analysis were available in the public domain FORTRAN libraries for HEP. Nowadays the situation is hardly unchanged even among the libraries of the new generation. The present project in progress aims to develop an object-oriented software toolkit for statistical data analysis. More in particular, the Statistical Comparison component of the toolkit provides algorithms for the comparison of data distributions in a variety of use cases typical of HEP experiments, as regression testing (in various phases of the software life-cycle), validation of simulation through comparison to experimental data, comparison of expected versus reconstructed distributions, comparison of data from different sources - such as different sets of experimental data, or experimental with respect to theoretical distributions. The toolkit contains a variety of goodness-of-fit tests, from chi-squared to Kolmogorov-Smirnov, to less known, but generally much more powerful tests such as Anderson-Darling, Lilliefors, Cramer-von Mises, Kuiper. Thanks to the component-based design and the usage of the standard AIDA interfaces, this tool can be used by other data analysis systems or integrated in experimental software frameworks. We present the architecture of the system, the statistics methods implemented and the results of its first applications to the validation of the Geant4 Simulation Toolkit and to experimental data analysis.

[1]  M. Fisz On a Result by M. Rosenblatt Concerning the Von Mises-Smirnov Test , 1960 .

[2]  H. Cramér On the composition of elementary errors: Second paper: Statistical applications , 1928 .

[3]  T. W. Anderson,et al.  Asymptotic Theory of Certain "Goodness of Fit" Criteria Based on Stochastic Processes , 1952 .

[4]  L. A. Goodman,et al.  Kolmogorov-Smirnov tests for psychological research. , 1954, Psychological bulletin.

[5]  S. Donadio,et al.  Precision validation of Geant4 electromagnetic physics , 2003, 2003 IEEE Nuclear Science Symposium. Conference Record (IEEE Cat. No.03CH37515).

[6]  B. Mascialino,et al.  Implementation of a new Monte Carlo simulation tool for the development of a proton therapy beam line and verification of the related dose distributions , 2003, 2003 IEEE Nuclear Science Symposium. Conference Record (IEEE Cat. No.03CH37515).

[7]  N. Kuiper Tests concerning random points on a circle , 1960 .

[8]  Richard Von Mises,et al.  Wahrscheinlichkeitsrechnung und ihre Anwendung in der Statistik und theoretischen Physik , 1931 .

[9]  Jean-Marie Dufour,et al.  Série Scientifique Scientific Series Exact Nonparametric Two- Sample Homogeneity Tests for Possibly Discrete Distributions Exact Nonparametric Two-sample Homogeneity Tests for Possibly Discrete Distribution , 2022 .

[10]  A. Martin-Löf On the composition of elementary errors , 1994 .

[11]  M. A. Stephens,et al.  Introduction to Kolmogorov (1933) On the Empirical Determination of a Distribution , 1992 .

[12]  Istituto italiano degli attuari Giornale dell'Istituto italiano degli attuari , 1930 .

[13]  A. Mantero,et al.  Simulation of X-ray fluorescence and application to planetary astrophysics , 2003, 2003 IEEE Nuclear Science Symposium. Conference Record (IEEE Cat. No.03CH37515).