We present a project in progress to develop a software toolkit for statistical data analysis. The toolkit is based on advanced software technologies, integrating generic programming techniques with object oriented methods, and adopts a rigorous software process, to ensure a high quality of the product. Thanks to the component-based architecture and the usage of the standard AIDA interfaces, this tool can be easily used by other data analysis systems or integrated in experimental frameworks. The initial component of the system addresses goodness of fit tests; its applications include the comparisons of data distributions in a variety of use cases typical of HEP experiments: regression testing (in various phases of the software life-cycle), validation of simulation through comparison to experimental data, comparison of expected versus reconstructed distributions, comparison of different experimental distributions - or of experimental with respect to theoretical ones - in physics analysis, monitoring detector behavior with respect to a reference in online DAQ. The system will provide the user the option to choose among a wide set of goodness-of-fit tests (chi-squared, KolmogorovSmirnov, Anderson-Darling, Lilliefors, Kuiper, Cramer-von Mises, etc.), specialised for various types of binned and unbinned distributions. Its flexible design makes it open to further extension to implement other tests. This system would represent a significant improvement with respect to the current availability of comparison tests in HEP libraries, limited to the chi-squared and Kolmogorov-Smirnov algorithms. We present the architecture of the toolkit, the detailed design of the basic statistical testing component and preliminary results of its application, in particular concerning the physics validation of the Geant4 Simulation Toolkit. We discuss the openness of the project, welcoming contributions from experts and user requirements from experiments.
[1]
M. Fisz.
On a Result by M. Rosenblatt Concerning the Von Mises-Smirnov Test
,
1960
.
[2]
H. Cramér.
On the composition of elementary errors: Second paper: Statistical applications
,
1928
.
[3]
N. Kuiper.
Tests concerning random points on a circle
,
1960
.
[4]
H. Cramér.
On the composition of elementary errors
,
.
[5]
M. A. Stephens,et al.
Introduction to Kolmogorov (1933) On the Empirical Determination of a Distribution
,
1992
.
[6]
Jean-Marie Dufour,et al.
Série Scientifique Scientific Series Exact Nonparametric Two- Sample Homogeneity Tests for Possibly Discrete Distributions Exact Nonparametric Two-sample Homogeneity Tests for Possibly Discrete Distribution
,
2022
.
[7]
T. W. Anderson,et al.
Asymptotic Theory of Certain "Goodness of Fit" Criteria Based on Stochastic Processes
,
1952
.
[8]
L. A. Goodman,et al.
Kolmogorov-Smirnov tests for psychological research.
,
1954,
Psychological bulletin.