The CMS experiment at the CERN LHC developed the workflow management archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregate $$\mathcal {O}$$O(1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.
[1]
J Balcas,et al.
Using the glideinWMS System as a Common Resource Provisioning Layer in CMS
,
2015
.
[2]
D. Spiga,et al.
The CMS workload management system
,
2012
.
[3]
Igor Sfiligoi,et al.
The Pilot Way to Grid Resources Using glideinWMS
,
2009,
2009 WRI World Congress on Computer Science and Information Engineering.
[4]
Dorian Kcira,et al.
CMS computing operations during run 1
,
2014
.
[5]
M. Giffels,et al.
The CMS Data Management System
,
2014
.
[6]
Douglas Thain,et al.
Distributed computing in practice: the Condor experience
,
2005,
Concurr. Pract. Exp..
[7]
Daniele Spiga,et al.
The CMS Remote Analysis Builder (CRAB)
,
2007,
HiPC.