Improving packet processing performance in the ATLAS FELIX project: analysis and optimization of a memory-bounded algorithm

Experiments in high-energy physics (HEP) and related fields often impose constraints and challenges on data acquisition systems. As a result, these systems are implemented as unique mixtures of custom and commercial-off-the-shelf electronics (COTS), involving and connecting radiation-hard devices, large high-performance networks, and computing farms. FELIX, the Frontend Link Exchange, is a new PC-based general purpose data routing device for the data-acquisition system of the ATLAS experiment at CERN. Performance is a very crucial point for devices like FELIX, which have to be capable of processing tens of gigabyte of data per second. Thus it is important to understand the performance limitations for typical workloads on modern hardware. In this paper the analysis of FELIX packet processing algorithm is presented. The role played by the PC system's memory architecture in the overall data throughput is discussed and motivated, both by measurements and theoretical means. Finally, optimizations increasing the processing throughput by a factor larger than 10x are analyzed.