论文信息 - High performance data analysis via coordinated caches

High performance data analysis via coordinated caches

With the second run period of the LHC, high energy physics collaborations will have to face increasing computing infrastructural needs. Opportunistic resources are expected to absorb many computationally expensive tasks, such as Monte Carlo event simulation. This leaves dedicated HEP infrastructure with an increased load of analysis tasks that in turn will need to process an increased volume of data. In addition to storage capacities, a key factor for future computing infrastructure is therefore input bandwidth available per core. Modern data analysis infrastructure relies on one of two paradigms: data is kept on dedicated storage and accessed via network or distributed over all compute nodes and accessed locally. Dedicated storage allows data volume to grow independently of processing capacities, whereas local access allows processing capacities to scale linearly. However, with the growing data volume and processing requirements, HEP will require both of these features. For enabling adequate user analyses in the future, the KIT CMS group is merging both paradigms: popular data is spread over a local disk layer on compute nodes, while any data is available from an arbitrarily sized background storage. This concept is implemented as a pool of distributed caches, which are loosely coordinated by a central service. A Tier 3 prototype cluster is currently being set up for performant user analyses of both local and remote data.

[1] Jamie Shiers,et al. The Worldwide LHC Computing Grid (worldwide LCG) , 2007, Comput. Phys. Commun..

[2] Christopher Jung,et al. Tier 3 batch system data locality via managed caches , 2015 .

[3] Claudio Grandi,et al. The CMS Computing Model , 2004 .

[4] LivnyMiron,et al. Distributed computing in practice: the Condor experience , 2005 .

[5] Dario Barberis,et al. The ATLAS Computing Model , 2010 .

[6] Sang Lyul Min,et al. LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies , 2001, IEEE Trans. Computers.