EURETILE Design Flow: Dynamic and Fault Tolerant Mapping of Multiple Applications Onto Many-Tile Systems

EURETILE investigates foundational innovations in the design of massively parallel tiled computing systems by introducing a novel parallel programming paradigm and a multi-tile hardware architecture. Each tile includes multiple general-purpose processors, specialized accelerators, and a fault-tolerant distributed network processor, which connects the tile to the inter-tile communication network. This paper focuses on the EURETILE software design flow, which provides a novel programming environment to map multiple dynamic applications onto a many-tile architecture. The elaborated high-level programming model specifies each application as a network of autonomous processes, enabling the automatic generation and optimization of the architecture-specific implementation. Behavioral and architectural dynamism is handled by a hierarchically organized runtime-manager running on top of a lightweight operating system. To evaluate, debug, and profile the generated binaries, a scalable many-tile simulator has been developed. High system dependability is achieved by combining hardware-based fault awareness strategies with software-based fault reactivity strategies. We demonstrate the capability of the design flow to exploit the parallelism of many-tile architectures with various embedded and high performance computing benchmarks targeting the virtual EURETILE platform with up to 192 tiles.

[1]  Lothar Thiele,et al.  Expandable process networks to efficiently specify and explore task, data, and pipeline parallelism , 2013, 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).

[2]  Sander Stuijk,et al.  Multiprocessor Resource Allocation for Throughput-Constrained Synchronous Dataflow Graphs , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[3]  Soonhoi Ha,et al.  Multi-objective mapping optimization via problem decomposition for many-core systems , 2012, 2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia.

[4]  Lothar Thiele,et al.  Reliable and Efficient Execution of Multiple Streaming Applications on Intel's SCC Processor , 2013, Euro-Par Workshops.

[5]  Timothy Mattson,et al.  A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[6]  Rainer Leupers,et al.  EURETILE 2010-2012 summary: first three years of activity of the European Reference Tiled Experiment , 2013, ArXiv.

[7]  Rainer Leupers,et al.  Time-decoupled parallel SystemC simulation , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[8]  Lothar Thiele,et al.  Scenario-based design flow for mapping streaming applications onto on-chip many-core systems , 2012, CASES '12.

[9]  Lothar Thiele,et al.  Efficient Worst-Case Temperature Evaluation for Thermal-Aware Assignment of Real-Time Applications on MPSoCs , 2013, Journal of Electronic Testing.

[10]  Lothar Thiele,et al.  Mapping Applications to Tiled Multiprocessor Embedded Systems , 2007, Seventh International Conference on Application of Concurrency to System Design (ACSD 2007).

[11]  Kees G. W. Goossens,et al.  CoMPSoC: A template for composable and predictable multi-processor system on chips , 2009, TODE.

[12]  Davide Rossetti,et al.  APEnet+: a 3D Torus network optimized for GPU-based HPC Systems , 2012 .

[13]  Gilles Kahn,et al.  The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[14]  Nigel P. Topham,et al.  High Speed CPU Simulation Using LTU Dynamic Binary Translation , 2009, HiPEAC.

[15]  Saurabh Dighe,et al.  A 48-Core IA-32 Processor in 45 nm CMOS Using On-Die Message-Passing and DVFS for Performance and Power Scaling , 2011, IEEE Journal of Solid-State Circuits.

[16]  Dawson R. Engler,et al.  Exokernel: an operating system architecture for application-level resource management , 1995, SOSP.

[17]  Lothar Thiele,et al.  An efficient real time fault detection and tolerance framework validated on the intel SCC processor , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[18]  Frédéric Pétrot,et al.  A System Framework for the Design of Embedded Software Targeting Heterogeneous Multi-core SoCs , 2009, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors.

[19]  Lothar Thiele,et al.  Predictability for timing and temperature in multiprocessor system-on-chip platforms , 2013, ACM Trans. Embed. Comput. Syst..

[20]  Hugh Garraway Parallel Computer Architecture: A Hardware/Software Approach , 1999, IEEE Concurrency.

[21]  Pier Stanislao Paolucci,et al.  Design and implementation of a modular, low latency, fault-aware, FPGA-based network interface , 2013, 2013 International Conference on Reconfigurable Computing and FPGAs (ReConFig).

[22]  Ed F. Deprettere,et al.  An Approach for Quantitative Analysis of Application-Specific Dataflow Architectures , 1997, ASAP.

[23]  Rainer Leupers,et al.  SHAPES:: a tiled scalable software hardware architecture platform for embedded systems , 2006, Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).

[24]  Rainer Leupers,et al.  legaSCi: Legacy SystemC Model Integration into Parallel Systemc Simulators , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[25]  Lothar Thiele,et al.  Embedding formal performance analysis into the design cycle of MPSoCs for real-time streaming applications , 2012, TECS.

[26]  Davide Rossetti,et al.  QUonG: A GPU-based HPC System Dedicated to LQCD Computing , 2011, 2011 Symposium on Application Accelerators in High-Performance Computing.

[27]  Yan Zhang,et al.  Pareto based Multi-objective Mapping IP Cores onto NoC Architectures , 2006, APCCAS 2006 - 2006 IEEE Asia Pacific Conference on Circuits and Systems.

[28]  G. Braun,et al.  A universal technique for fast and flexible instruction-set architecture simulation , 2002, Proceedings 2002 Design Automation Conference (IEEE Cat. No.02CH37324).