A heterogeneous many-core platform for experiments on scalable custom interconnects and management of fault and critical events, applied to many-process applications: Vol. II, 2012 technical report

This is the second of a planned collection of four yearly volumes describing the deployment of a heterogeneous many-core platform for experiments on scalable custom interconnects and management of fault and critical events, applied to many-process applications. This volume covers several topics, among which: 1- a system for awareness of faults and critical events (named LO|FA|MO) on experimental heterogeneous many-core hardware platforms; 2- the integration and test of the experimental hardware heterogeneous many-core platform QUoNG, based on the APEnet+ custom interconnect; 3- the design of a Software-Programmable Distributed Network Processor architecture (DNP) using ASIP technology; 4- the initial stages of design of a new DNP generation onto a 28nm FPGA. These developments were performed in the framework of the EURETILE European Project under the Grant Agreement no. 247846.

[1]  Kunle Olukotun,et al.  Efficient Parallel Graph Exploration on Multi-Core CPU and GPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[2]  Dhabaleswar K. Panda,et al.  OMB-GPU: A Micro-Benchmark Suite for Evaluating MPI Libraries on GPU Clusters , 2012, EuroMPI.

[3]  Davide Rossetti,et al.  APEnet+ project status , 2012 .

[4]  Pier Stanislao Paolucci,et al.  The Distributed Network Processor: a novel off-chip and on-chip interconnection network architecture , 2012, ArXiv.

[5]  Bálint Joó,et al.  Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[6]  Mukul Golash Reliability in Ethernet networks: A survey of various approaches , 2006, Bell Labs Technical Journal.

[7]  Martin J. Savage,et al.  Nuclear Physics from QCD : The Anticipated Impact of Exa-Scale Computing , 2010, 1012.0876.

[8]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[9]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[10]  L. Leuzzi,et al.  Criticality of the XY model in complex topologies , 2012 .

[11]  Giorgio Turchetti,et al.  Towards robust algorithms for current deposition and dynamic load-balancing in a GPU particle in cell code , 2013 .

[12]  Steven A. Gottlieb,et al.  Scaling lattice QCD beyond 100 GPUs , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[13]  Massimo Bernaschi,et al.  Efficient breadth first search on multi-GPU systems , 2013, J. Parallel Distributed Comput..

[14]  Francesco Negro,et al.  Susceptibility of the QCD vacuum to CP-odd electromagnetic background fields. , 2013, Physical review letters.

[15]  Davide Rossetti,et al.  APEnet+: a 3D Torus network optimized for GPU-based HPC Systems , 2012 .

[16]  M. Sozzi,et al.  Fast online triggering in high-energy physics experiments using GPUs , 2012 .

[17]  Roberto Capuzzo-Dolcetta,et al.  A fully parallel, high precision, N-body code running on hybrid computing platforms , 2012, J. Comput. Phys..

[18]  Kipton Barros,et al.  Solving lattice QCD systems of equations using mixed precision solvers on GPUs , 2009, Comput. Phys. Commun..

[19]  Massimo Bernaschi,et al.  GPU Peer-to-Peer Techniques Applied to a Cluster Interconnect , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[20]  Pier Stanislao Paolucci,et al.  APEnet+: high bandwidth 3D torus direct network for petaflops scale commodity clusters , 2011, ArXiv.

[21]  Massimo Bernaschi,et al.  Multi-GPU codes for spin systems simulations , 2012, Comput. Phys. Commun..

[22]  Pier Stanislao Paolucci,et al.  'Mutual Watch-dog Networking': Distributed Awareness of Faults and Critical Events in Petascale/Exascale systems , 2013, ArXiv.

[23]  Massimo Bernaschi,et al.  Benchmarking of communication techniques for GPUs , 2013, J. Parallel Distributed Comput..

[24]  W. Ketchum,et al.  Applications of GPUs to online track reconstruction in HEP experiments , 2012, 2012 IEEE Nuclear Science Symposium and Medical Imaging Conference Record (NSS/MIC).

[25]  Rainer Leupers,et al.  EURETILE 2010-2012 summary: first three years of activity of the European Reference Tiled Experiment , 2013, ArXiv.

[26]  Miguel Castro,et al.  Practical byzantine fault tolerance and proactive recovery , 2002, TOCS.

[27]  Davide Rossetti,et al.  QUonG: A GPU-based HPC System Dedicated to LQCD Computing , 2011, 2011 Symposium on Application Accelerators in High-Performance Computing.