论文信息 - A heterogeneous many-core platform for experiments on scalable custom interconnects and management of fault and critical events, applied to many-process applications: Vol. II, 2012 technical report

A heterogeneous many-core platform for experiments on scalable custom interconnects and management of fault and critical events, applied to many-process applications: Vol. II, 2012 technical report

This is the second of a planned collection of four yearly volumes describing the deployment of a heterogeneous many-core platform for experiments on scalable custom interconnects and management of fault and critical events, applied to many-process applications. This volume covers several topics, among which: 1- a system for awareness of faults and critical events (named LO|FA|MO) on experimental heterogeneous many-core hardware platforms; 2- the integration and test of the experimental hardware heterogeneous many-core platform QUoNG, based on the APEnet+ custom interconnect; 3- the design of a Software-Programmable Distributed Network Processor architecture (DNP) using ASIP technology; 4- the initial stages of design of a new DNP generation onto a 28nm FPGA. These developments were performed in the framework of the EURETILE European Project under the Grant Agreement no. 247846.

[1] Kunle Olukotun,et al. Efficient Parallel Graph Exploration on Multi-Core CPU and GPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[2] Dhabaleswar K. Panda,et al. OMB-GPU: A Micro-Benchmark Suite for Evaluating MPI Libraries on GPU Clusters , 2012, EuroMPI.

[3] Davide Rossetti,et al. APEnet+ project status , 2012 .

[4] Pier Stanislao Paolucci,et al. The Distributed Network Processor: a novel off-chip and on-chip interconnection network architecture , 2012, ArXiv.

[5] Bálint Joó,et al. Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[6] Mukul Golash. Reliability in Ethernet networks: A survey of various approaches , 2006, Bell Labs Technical Journal.

[7] Martin J. Savage,et al. Nuclear Physics from QCD : The Anticipated Impact of Exa-Scale Computing , 2010, 1012.0876.

[8] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .

[9] Ramesh Subramonian,et al. LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[10] L. Leuzzi,et al. Criticality of the XY model in complex topologies , 2012 .

[11] Giorgio Turchetti,et al. Towards robust algorithms for current deposition and dynamic load-balancing in a GPU particle in cell code , 2013 .

[12] Steven A. Gottlieb,et al. Scaling lattice QCD beyond 100 GPUs , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[13] Massimo Bernaschi,et al. Efficient breadth first search on multi-GPU systems , 2013, J. Parallel Distributed Comput..

[14] Francesco Negro,et al. Susceptibility of the QCD vacuum to CP-odd electromagnetic background fields. , 2013, Physical review letters.

[15] Davide Rossetti,et al. APEnet+: a 3D Torus network optimized for GPU-based HPC Systems , 2012 .

[16] M. Sozzi,et al. Fast online triggering in high-energy physics experiments using GPUs , 2012 .

[17] Roberto Capuzzo-Dolcetta,et al. A fully parallel, high precision, N-body code running on hybrid computing platforms , 2012, J. Comput. Phys..

[18] Kipton Barros,et al. Solving lattice QCD systems of equations using mixed precision solvers on GPUs , 2009, Comput. Phys. Commun..

[19] Massimo Bernaschi,et al. GPU Peer-to-Peer Techniques Applied to a Cluster Interconnect , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[20] Pier Stanislao Paolucci,et al. APEnet+: high bandwidth 3D torus direct network for petaflops scale commodity clusters , 2011, ArXiv.

[21] Massimo Bernaschi,et al. Multi-GPU codes for spin systems simulations , 2012, Comput. Phys. Commun..

[22] Pier Stanislao Paolucci,et al. 'Mutual Watch-dog Networking': Distributed Awareness of Faults and Critical Events in Petascale/Exascale systems , 2013, ArXiv.

[23] Massimo Bernaschi,et al. Benchmarking of communication techniques for GPUs , 2013, J. Parallel Distributed Comput..

[24] W. Ketchum,et al. Applications of GPUs to online track reconstruction in HEP experiments , 2012, 2012 IEEE Nuclear Science Symposium and Medical Imaging Conference Record (NSS/MIC).

[25] Rainer Leupers,et al. EURETILE 2010-2012 summary: first three years of activity of the European Reference Tiled Experiment , 2013, ArXiv.

[26] Miguel Castro,et al. Practical byzantine fault tolerance and proactive recovery , 2002, TOCS.

[27] Davide Rossetti,et al. QUonG: A GPU-based HPC System Dedicated to LQCD Computing , 2011, 2011 Symposium on Application Accelerators in High-Performance Computing.