CRAB: A CMS application for distributed analysis

Starting from 2008, the CMS experiment will produce several Pbytes of data every year, to be distributed over many computing centers geographically distributed in different countries. The CMS computing model defines how the data has to be distributed and accessed in order to enable physicists to run efficiently their analysis over the data. The analysis will be thus performed in a distributed way using Grid infrastructure. CRAB (CMS Remote Analysis Builder) is a specific tool, designed and developed by the CMS collaboration, that allows a transparent access to distributed data to end physicist. CRAB interacts with the local user environment, the CMS Data Management services and with the Grid middleware: it takes care of the data and resources discovery; it splits the user task in several analysis processes (jobs) and distribute and parallelize them over different Grid environments; it takes care of the process tracking and output handling. Very limited knowledge of underlying technical details are required to the end user. The tool can be used as a direct interface to the computing system or can delegate the task to a server, which takes care of the user jobs handling, providing services as automatic resubmission in case of failures and notification to the user of the task status. Its current implementation is able to interact with WLCG, gLite and OSG Grid middlewares. Furthermore it allows in the very same way the access to local data and batch systems such as LSF. CRAB has been in production and in routine use by end-users since Spring 2004. It has been extensively used in studies to prepare the Physics Technical Design Report, in the analysis of reconstructed event samples generated during the Computing Software and Analysis Challenges and in the preliminary cosmic rays data taking. The CRAB architecture and the usage inside the CMS community will be described in detail, as well as the current status and future development.

[1]  Claudia-Elisabeth Wulz,et al.  The CMS experiment at CERN , 2005, SPIE Optics + Optoelectronics.

[2]  R. Jones,et al.  Ganga user interface for job definition and management , 2005 .

[3]  J. Klem,et al.  The commissioning of CMS computing centres in the worldwide LHC computing Grid , 2008, 2008 IEEE Nuclear Science Symposium Conference Record.

[4]  Marcelino B. Santos,et al.  CMS Physics Technical Design Report, Volume II: Physics Performance , 2007 .

[5]  Jose M Hernandez,et al.  Use of the gLite-WMS in CMS for production and analysis , 2010 .

[6]  Andrei Tsaregorodtsev,et al.  DIRAC: Reliable Data Management for LHCb , 2008 .

[7]  Karl Harrison,et al.  Distributed analysis using GANGA on the EGEE/LCG infrastructure , 2008 .

[8]  Yuyi Guo,et al.  The CMS dataset bookkeeping service , 2008 .

[9]  Jose M Hernandez,et al.  The CMS Monte Carlo Production System: Development and Design , 2008 .

[10]  Pablo Saiz,et al.  The commissioning of CMS computing centres in the WLCG grid , 2008 .

[11]  Federico Carminati,et al.  AliEn: ALICE environment on the GRID , 2008 .

[12]  Ricardo Rocha,et al.  Dashboard for the LHC Experiments , 2009 .

[13]  T Maeno,et al.  PanDA: distributed production and distributed analysis system for ATLAS , 2008 .

[14]  N. H. Brook The LHCb Computing Strategy , 2008 .

[15]  Gerd Behrmann,et al.  A distributed storage system with dCache , 2008 .

[16]  Tiziana Ferrari,et al.  WMSMonitor: A monitoring tool for workload and job lifecycle in Grids , 2008, 2008 9th IEEE/ACM International Conference on Grid Computing.

[17]  Anders Wäänänen,et al.  Advanced resource connector middleware for lightweight computational Grids , 2007 .

[18]  Brian Bockelman,et al.  Scaling CMS data transfer system for LHC start-up , 2008 .

[19]  Daniele Spiga,et al.  The CMS Remote Analysis Builder (CRAB) , 2007, HiPC.

[20]  Dario Barberis,et al.  The ATLAS computing model , 2008 .