The LHCb HLT2 Storage System: A 40-GB/s System Made of Commercial Off-the-Shelf Components and Open-Source Software

The Large Hadron Collider beauty (LHCb) experiment is designed to study differences between particles and antiparticles as well as very rare decays in the charm and beauty sector of the standard model at the Large Hadron Collider (LHC). With the major upgrade done in view of Run 3, the detector will read out all events at the full LHC bunch-crossing frequency of 40 MHz. The LHCb data acquisition (DAQ) system will be subject to a considerably increased data rate, reaching a peak of 40 Tb/s. The second stage of the two-stage filtering consists of more than 10000 multithreaded processes, which simultaneously write output files at an aggregated bandwidth of 100 Gb/s. At the same time, a small number of file-moving processes will read files from the same storage to copy them over to tape storage. This whole mechanism must run reliably over months and be able to cope with significant fluctuations. Moreover, for cost reasons, it must be built from off-the-shelf components. In this article, we describe LHCb’s solution to this challenge. We show the design, present reasons for the design choices, the configuration and tuning of the adopted software solution, and present performance figures.

[1]  Michael Stumm,et al.  RocksDB: Evolution of Development Priorities in a Key-value Store Serving Large-scale Applications , 2021, ACM Trans. Storage.

[2]  Sébastien Valat,et al.  The LHCb DAQ Upgrade for LHC Run3 , 2019, IEEE Transactions on Nuclear Science.

[3]  Balazs Voneki,et al.  The LHCb Online system in 2020: trigger-free read-out with (almost exclusively) off-the-shelf hardware , 2018, Journal of Physics: Conference Series.

[4]  Balaji Srinivasan Babu,et al.  Erasure coding for distributed storage: an overview , 2018, Science China Information Sciences.

[5]  S. Mersi Phase-2 Upgrade of the CMS Tracker , 2016 .

[6]  Andreas J. Peters,et al.  EOS as the present and future solution for data storage at CERN , 2015 .

[7]  D. Gigi,et al.  Online data handling and storage at the CMS experiment , 2015 .

[8]  P. Buncic,et al.  Technical Design Report for the Upgrade of the Online-Offline Computing System , 2015 .

[9]  Eric van Herwijnen,et al.  Deferred High Level Trigger in LHCb: A Boost to CPU Resource Utilization , 2014 .

[10]  Chris P. Barnes,et al.  The LHCb detector at the LHC , 2008 .

[11]  João Paulo Teixeira,et al.  The CMS experiment at the CERN LHC , 2008 .

[12]  Carlos Maltzahn,et al.  RADOS: a scalable, reliable storage service for petabyte-scale storage clusters , 2007, PDSW '07.

[13]  S.A. Brandt,et al.  CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[14]  Jorge Castiñeira Moreira,et al.  Reed–Solomon Codes , 2006 .

[15]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[16]  H. N. Kim,et al.  The ALICE experiment at the CERN LHC , 2003 .

[17]  Frank B. Schmuck,et al.  Proceedings of the Fast 2002 Conference on File and Storage Technologies Gpfs: a Shared-disk File System for Large Computing Clusters , 2022 .