BLonD++: performance analysis and optimizations for enabling complex, accurate and fast beam dynamics studies

This paper focuses on the performance analysis and optimization for enabling efficient implementations of next generation beam dynamics simulations. Nowadays large worldwide research centers, e.g. CERN, Fermilab etc. are continuously investing in resources and infrastructures for progressing knowledge in the fields of particle physics, thus requiring careful studies and planing for the upcoming upgrades of the synchrotrons and the design of future machines. Consequently, there is an emerging need for simulations that incorporate a collection of complex physics phenomena, produce extremely accurate predictions while keeping the computing resources and run-time to a minimum. A variety of simulator suites have been developed, however, they have been reported to lack in simulation speed, features and ease-of-use. In this paper we introduce the Beam Longitudinal Dynamics (BLonD) simulator suite from a computer engineering perspective. We analyze its performance to understand its current bottlenecks and enhance it further in an attempt to make complex, accurate and fast beam dynamics simulations possible. We show that through careful and targeted analysis and code tuning, the proposed BLonD++ implementation delivers significant gains in terms of performance, i.e. up-to 23X single-core speedup and scalability, thus enabling the deployment of even more complex simulations than the current state-of-art.

[1]  Endong Wang,et al.  Intel Math Kernel Library , 2014 .

[2]  Jeffrey Holmes,et al.  The Particle Accelerator Simulation Code PyORBIT , 2015, ICCS.

[3]  Helga Timko,et al.  Benchmarking the Beam Longitudinal Dynamics Code BLonD , 2016 .

[4]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[5]  Stéphan Jourdan,et al.  Haswell: The Fourth-Generation Intel Core Processor , 2014, IEEE Micro.

[6]  Ahmad Yasin,et al.  A Top-Down method for performance analysis and counters architecture , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[7]  Michael Benedikt,et al.  Future Circular Collider Study , 2014 .

[8]  L. Rossi,et al.  Chapter 1: High Luminosity Large Hadron Collider HL-LHC , 2016, 1705.08830.

[9]  Elena Shaposhnikova,et al.  JACoW : Identification and reduction of the CERN SPS impedance , 2016 .

[10]  Boris Schling The Boost C++ Libraries , 2011 .

[11]  Danilo Piparo,et al.  Speeding up HEP experiment software with a library of fast and auto-vectorisable mathematical functions , 2014 .

[12]  Nicolai M. Josuttis The C++ Standard Library: A Tutorial and Reference , 2012 .

[13]  M. Borland,et al.  Elegant : a flexible SDDS-compliant code for accelerator simulation. , 2000 .

[14]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[15]  Sandeep Koranne,et al.  Boost C++ Libraries , 2011 .

[16]  J. R. King,et al.  GPU acceleration and performance of the particle-beam-dynamics code Elegant , 2017, Comput. Phys. Commun..

[17]  M. Borland,et al.  Pelegant: A Parallel Accelerator Simulation Code for Electron Generation and Tracking , 2006 .

[18]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[19]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[20]  Danilo Quartullo,et al.  Longitudinal Injection Schemes For the CERN PS Booster at 160 MeV Including Space Charge Effects , 2015 .