APEnet+ 34 Gbps data transmission system and custom transmission logic

APEnet+ is a point-to-point, low-latency, 3D-torus network controller integrated in a PCIe Gen2 board based on the Altera Stratix IV FPGA. We characterize the transmission system (embedded transceivers driving external QSFP+ modules) analyzing signal integrity, throughput, latency, BER and jitter at different data rates up to 34 Gbps. We estimate the efficiency of a custom logic able to sustain 2.6 GB/s per link with an FPGA on-chip memory footprint of 40 KB, providing deadlock-free routing and systemic awareness of faults. Finally, we show the preliminary results obtained with the embedded transceivers of a next-generation FPGA and outline some ideas to increase the performance with the same FPGA memory footprint.

[1]  Alessandro Forin,et al.  Direct GPU/FPGA communication Via PCI express , 2012, 2012 41st International Conference on Parallel Processing Workshops.

[2]  Leonard Kleinrock,et al.  Virtual Cut-Through: A New Computer Communication Switching Technique , 1979, Comput. Networks.

[3]  Davide Rossetti,et al.  QUonG: A GPU-based HPC System Dedicated to LQCD Computing , 2011, 2011 Symposium on Application Accelerators in High-Performance Computing.

[4]  Massimo Bernaschi,et al.  GPU Peer-to-Peer Techniques Applied to a Cluster Interconnect , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[5]  Holger Fröning,et al.  GGAS: Global GPU address spaces for efficient communication in heterogeneous clusters , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[6]  Mitsuhisa Sato,et al.  Interconnection Network for Tightly Coupled Accelerators Architecture , 2013, 2013 IEEE 21st Annual Symposium on High-Performance Interconnects.

[7]  Davide Rossetti,et al.  APEnet+: a 3D Torus network optimized for GPU-based HPC Systems , 2012 .

[8]  Davide Rossetti,et al.  A 34 Gbps data transmission system with FPGAs embedded transceivers and QSFP+ modules , 2012, 2012 IEEE Nuclear Science Symposium and Medical Imaging Conference Record (NSS/MIC).

[9]  Pier Stanislao Paolucci,et al.  'Mutual Watch-dog Networking': Distributed Awareness of Faults and Critical Events in Petascale/Exascale systems , 2013, ArXiv.

[10]  Rainer Leupers,et al.  EURETILE 2010-2012 summary: first three years of activity of the European Reference Tiled Experiment , 2013, ArXiv.