ROOT - A C++ framework for petabyte data storage, statistical analysis and visualization

ROOT is an object-oriented C++ framework conceived in the high-energy physics (HEP) community, designed for storing and analyzing petabytes of data in an efficient way. Any instance of a C++ class can be stored into a ROOT file in a machine-independent compressed binary format. In ROOT the TTree object container is optimized for statistical data analysis over very large data sets by using vertical data storage techniques. These containers can span a large number of files on local disks, the web, or a number of different shared file systems. In order to analyze this data, the user can chose out of a wide set of mathematical and statistical functions, including linear algebra classes, numerical algorithms such as integration and minimization, and various methods for performing regression analysis (fitting). In particular, the RooFit package allows the user to perform complex data modeling and fitting while the RooStats library provides abstractions and implementations for advanced statistical tools. Multivariate classification methods based on machine learning techniques are available via the TMVA package. A central piece in these analysis tools are the histogram classes which provide binning of oneand multi-dimensional data. Results can be saved in high-quality graphical formats like Postscript and PDF or in bitmap formats like JPG or GIF. The result can also be stored into ROOT macros that allow a full recreation and rework of the graphics. Users typically create their analysis macros step by step, making use of the interactive C++ interpreter CINT, while running over small data samples. Once the Preprint submitted to Elsevier 31 August 2015 development is finished, they can run these macros at full compiled speed over large data sets, using on-the-fly compilation, or by creating a stand-alone batch program. Finally, if processing farms are available, the user can reduce the execution time of intrinsically parallel tasks – e.g. data mining in HEP – by using PROOF, which will take care of optimally distributing the work over the available resources in a transparent way. PACS: 00; 07; 05

[1]  Kenneth Levenberg A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[2]  T. Koi,et al.  Geant 4 Developments and Applications , 2013 .

[3]  P. Mato,et al.  Developments of mathematical software libraries for the LHC experiments , 2005, IEEE Transactions on Nuclear Science.

[4]  Philippe Canal,et al.  Parallel interactive data analysis with PROOF , 2006 .

[5]  P. Mato,et al.  Reflection-Based Python-C++ Bindings , 2004 .

[6]  F. Tegenfeldt,et al.  TMVA - Toolkit for multivariate data analysis , 2012 .

[7]  Wouter Verkerke,et al.  The RooFit Toolkit for Data Modeling , 2003 .

[8]  D. Bertini,et al.  The FairRoot framework , 2012 .

[9]  Philippe Canal,et al.  The role of interpreters in high performance computing , 2008 .

[10]  李幼升,et al.  Ph , 1989 .

[11]  L. Moneta,et al.  Recent developments of the ROOT mathematical and statistical software , 2008 .

[12]  Brian Gough,et al.  GNU Scientific Library Reference Manual - Third Edition , 2003 .

[13]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[14]  V. S. Kurbatov,et al.  New method for minimizing regular functions with constraints on parameter region , 1994 .

[15]  Lorenzo Moneta,et al.  ROOT - A C++ framework for petabyte data storage, statistical analysis and visualization , 2009, Comput. Phys. Commun..

[16]  Hayes,et al.  Review of Particle Physics. , 1996, Physical review. D, Particles and fields.

[17]  Takuji Nishimura,et al.  Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.

[18]  A. Dell'Acqua,et al.  Geant4—a simulation toolkit , 2003 .

[19]  A. Ferrari,et al.  FLUKA: A Multi-Particle Transport Code , 2005 .

[20]  F. Rademakers,et al.  ROOT — An object oriented data analysis framework , 1997 .

[21]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.