InfiniBand for High Energy Physics

Distributed physics analysis techniques and parallel applications require a fast and efficient interconnect between compute nodes and I/O nodes. Apart from the required bandwidth the latency of message transfers is important, in particular in environments with many nodes. Ethernet is known to have high latencies of 30 μs to 60 μs for the common gigabit Ethernet hardware. The InfiniBand architecture is a relatively new, open industry standard. It defines a switched high-speed, low-latency fabric designed to connect compute nodes and I/O nodes with copper or fibre cables. The theoretical bandwidth is up to 30 Gbit/s. The Institute for Scientific Computing (IWR) at the Forschungszentrum Karlsruhe is testing InfiniBand technology since begin of 2003 and runs a cluster of dual Xeon nodes using the 4X (10 Gbit/s) version of the interconnect. Bringing the RFIO protocol – which is part of the CERN CASTOR facilities for sequential file transfers – to InfiniBand has been a big success, allowing significant reduction of CPU utilization and increase of file transfer speed. Performance results of RFIO on Xeon and Opteron platforms are presented. To simulate a typical situation in physics analysis of most up to date data, the RFIO daemon is stress tested in a multi-user environment. A first prototype of a direct interface to InfiniBand for the ROOT toolkit is currently designed and implemented. Furthermore, experiences with hardand software, in particular MPI performance results, will be reported.