The upgrade of the LHCb experiment at CERN envisions a Data Acquisition and Event Filtering system that captures 100% of the data generated by the various sub-detectors, which measure with great precision the 40 million collisions per second of protons in CERN's Large Hadron Collider. The sensor readings result in about 40 Tbit/s of data, which need to be processed on a large computer farm. Since the computation on CPUs, as it is currently done, does not scale well, it is necessary to accelerate a good portion of the code to meet the computational demands of the proposed system. We are therefore looking for means to accelerate the most time-consuming parts of the event-filtering code. The Ring Imaging Cherenkov (RICH) detectors are one of the component detectors of the overall LHCb experiment. The Cherenkov photon that hits the detector are processed to determine the track of the original particle that caused these photons. The particle velocity and mass, derived from the Cherenkov angle, is used to identify the particle. The entire RICH photon reconstruction algorithm accounts for 50% of the second High Level Trigger (HLT) process and Cherenkov angle reconstruction comprises about 20% of the RICH and is a good candidate for acceleration. An OpenCL implementation of Cherenkov angle reconstruction algorithm that calculates the trajectory of Photons in the RICH detector was developed. The paper looks at the results of the OpenCL implementation of the algorithm on the Nallatech 385 card with Altera Stratix V FPGA, Nvidia GeForce GTX 690 GPU card and the Intel Xeon processor for comparison. While the two GPUs are 3.6× faster than a single FPGA, the FPGA is 3.4× better than two GPUs and 6.6× better than two multicore CPUs when energy efficiency is factored. Although significant speedup of computation was achieved on all the above architectures by using OpenCL, a good portion of the gain was lost due to the overhead of data transfer and parallelism. Different strategies are put forth for improving the speedup. Some optimizations currently possible, low latency links that can replace PCIe and some possible changes to the OpenCL execution model itself are discussed.
[1]
R. Forty,et al.
RICH pattern recognition
,
1998
.
[2]
Marco Cattaneo,et al.
GAUDI — A software architecture and framework for building HEP data processing applications
,
2001
.
[3]
Robert J. Safranek,et al.
Intel® QuickPath Interconnect Architectural Features Supporting Scalable System Architectures
,
2010,
2010 18th IEEE Symposium on High Performance Interconnects.
[4]
Doris Chen,et al.
Invited paper: Using OpenCL to evaluate the efficiency of CPUS, GPUS and FPGAS for information filtering
,
2012,
22nd International Conference on Field Programmable Logic and Applications (FPL).
[5]
John Freeman,et al.
From opencl to high-performance hardware on FPGAS
,
2012,
22nd International Conference on Field Programmable Logic and Applications (FPL).
[6]
Jeffrey Stuecheli,et al.
CAPI: A Coherent Accelerator Processor Interface
,
2015,
IBM J. Res. Dev..
[7]
Srikanth Sridharan.
Evaluation of 'OpenCL for FPGA' for Data Acquisition and Acceleration in High Energy Physics
,
2015
.