Fundamentals and Compiler Framework

Heterogeneous systems including power-efficient hardware accelerators are dominating the design of nowadays and future embedded computer architectures—as a requirement for energy-efficient system design. In this context, we discuss the main principles of invasive computing, then, we subsequently present the concept and structure of invasive tightly coupled processor arrays (TCPAs), which form the basis for our experiments throughout the book. For the efficient utilization of an invasive TCPA, through the concrete invasive language InvadeX10, compiler support is paramount. Without such support, programming that leverages the abundant parallelism in such architectures is very difficult, tedious, and error-prone. Unfortunately, even nowadays, there is a lack of compiler frameworks for generating efficient parallel code for massively parallel architectures. In this chapter, we therefore present LoopInvader, the first compiler for mapping nested loop programs onto invasive TCPAs. We furthermore discuss the fundamentals and background of the underlying models for algorithm and application specification.

[1]  Paul Feautrier,et al.  Dataflow analysis of array and scalar references , 1991, International Journal of Parallel Programming.

[2]  Jürgen Teich,et al.  Application-driven reconfiguration of shared resources for timing predictability of MPSoC platforms , 2014, 2014 48th Asilomar Conference on Signals, Systems and Computers.

[3]  Jürgen Teich,et al.  Power-Efficient Reconfiguration Control in Coarse-Grained Dynamically Reconfigurable Architectures , 2009, J. Low Power Electron..

[4]  Jürgen Teich,et al.  A prototype of an adaptive computer vision algorithm on MPSoC architecture , 2013, 2013 Conference on Design and Architectures for Signal and Image Processing.

[5]  Michael Glaß,et al.  Language and Compilation of Parallel Programs for *-Predictable MPSoC Execution Using Invasive Computing , 2016, 2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC).

[6]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[7]  Jürgen Teich,et al.  Partitioning of processor arrays: a piecewise regular approach , 1993, Integr..

[8]  Götz Lindenmaier,et al.  Firm. An intermediate language for compiler research , 2005 .

[9]  Frank Hannig,et al.  Invasive Tightly-Coupled Processor Arrays , 2014, ACM Trans. Embed. Comput. Syst..

[10]  Jörg Henkel,et al.  Invasive manycore architectures , 2012, 17th Asia and South Pacific Design Automation Conference.

[11]  Jürgen Teich,et al.  Partitioning Processor Arrays under Resource Constraints , 1997, J. VLSI Signal Process..

[12]  Jürgen Teich,et al.  The Invasive Network on Chip - A Multi-Objective Many-Core Communication Infrastructure , 2014, ARCS Workshops.

[13]  Jürgen Teich,et al.  The PAULA Language for Designing Multi-Dimensional Dataflow-Intensive Applications , 2008, MBMV.

[14]  Jürgen Teich,et al.  Loop program mapping and compact code generation for programmable hardware accelerators , 2013, 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors.

[15]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[16]  Lothar Thiele,et al.  On the design of piecewise regular processor arrays , 1989, IEEE International Symposium on Circuits and Systems,.

[17]  Dan I. Moldovan,et al.  Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays , 1986, IEEE Transactions on Computers.

[18]  Jürgen Teich,et al.  System integration of tightly-coupled processor arrays using reconfigurable buffer structures , 2013, CF '13.

[19]  Jürgen Teich,et al.  Resource-aware programming and simulation of MPSoC architectures through extension of X10 , 2011, SCOPES.

[20]  Sebastian Buchwald,et al.  An X10 Compiler for Invasive Architectures , 2012 .

[21]  Jürgen Teich,et al.  Scheduling of partitioned regular algorithms on processor arrays with constrained resources , 1996, Proceedings of International Conference on Application Specific Systems, Architectures and Processors: ASAP '96.

[22]  Jürgen Teich,et al.  Dynamic Piecewise Linear/Regular Algorithms , 2004 .

[23]  Jürgen Teich,et al.  Invasive computing - Concepts and overheads , 2012, Proceeding of the 2012 Forum on Specification and Design Languages.

[24]  Frank Hannig,et al.  Scheduling Techniques for High-Throughput Loop Accelerators , 2009 .

[25]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[26]  Jürgen Teich,et al.  PARO: Synthesis of Hardware Accelerators for Multi-Dimensional Dataflow-Intensive Applications , 2008, ARC.

[27]  Jürgen Teich,et al.  Invasive Algorithms and Architectures Invasive Algorithmen und Architekturen , 2008, it Inf. Technol..

[28]  Christian Lengauer,et al.  Loop Parallelization in the Polytope Model , 1993, CONCUR.

[29]  Jingling Xue,et al.  Loop Tiling for Parallelism , 2000, Kluwer International Series in Engineering and Computer Science.

[30]  Jürgen Teich,et al.  Exploitation of Quality/Throughput Tradeoffs in Image Processing through Invasive Computing , 2013, PARCO.

[31]  Jürgen Teich,et al.  A highly parameterizable parallel processor array architecture , 2006, 2006 IEEE International Conference on Field Programmable Technology.

[32]  Vahid Lari Invasive Tightly Coupled Processor Arrays , 2016 .

[33]  Sven Verdoolaege,et al.  Polyhedral Extraction Tool , 2012 .

[34]  Jürgen Teich,et al.  A Novel Image Impulse Noise Removal Algorithm Optimized for Hardware Accelerators , 2017, J. Signal Process. Syst..

[35]  Jürgen Teich,et al.  High-Level Synthesis Revised - Generation of FPGA Accelerators from a Domain-Specific Language using the Polyhedron Model , 2013, PARCO.

[36]  Thomas Kailath,et al.  Regular iterative algorithms and their implementation on processor arrays , 1988, Proc. IEEE.

[37]  Richard M. Karp,et al.  The Organization of Computations for Uniform Recurrence Equations , 1967, JACM.

[38]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[39]  Jürgen Teich,et al.  Exact Partitioning of Affine Dependence Algorithms , 2002, Embedded Processor Design Challenges.

[40]  Jürgen Teich,et al.  Accuracy and performance analysis of Harris Corner computation on tightly-coupled processor arrays , 2013, 2013 Conference on Design and Architectures for Signal and Image Processing.

[41]  S. Mahlke,et al.  Multicore compilation strategies and challenges , 2009, IEEE Signal Processing Magazine.

[42]  Jingling Xue,et al.  On Tiling as a Loop Transformation , 1997, Parallel Process. Lett..

[43]  Jürgen Teich,et al.  Domain-specific augmentations for High-Level Synthesis , 2014, 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors.

[44]  Jürgen Teich,et al.  Decentralized dynamic resource management support for massively parallel processor arrays , 2011, ASAP 2011 - 22nd IEEE International Conference on Application-specific Systems, Architectures and Processors.

[45]  Albert Cohen,et al.  Putting Polyhedral Loop Transformations to Work , 2003, LCPC.

[46]  Jürgen Teich,et al.  Control generation in the design of processor arrays , 1991, J. VLSI Signal Process..

[47]  Jürgen Teich,et al.  Providing fault tolerance through invasive computing , 2016, it Inf. Technol..

[48]  Jürgen Teich,et al.  A Dynamically Reconfigurable Weakly Programmable Processor Array Architecture Template , 2006, ReCoSoC.

[49]  Jörg Henkel,et al.  Efficient Partial Online Synthesis of Special Instructions for Reconfigurable Processors , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[50]  Michael Glaß,et al.  Invasive computing for timing-predictable stream processing on MPSoCs , 2016, it Inf. Technol..

[51]  Sven Verdoolaege,et al.  isl: An Integer Set Library for the Polyhedral Model , 2010, ICMS.

[52]  Jürgen Teich,et al.  Acceleration of Optical Flow Computations on Tightly-Coupled Processor Arrays , 2013 .

[53]  Lothar Thiele,et al.  On the hierarchical design of VLSI processor arrays , 1988, 1988., IEEE International Symposium on Circuits and Systems.

[54]  M. Wegman,et al.  Global value numbers and redundant computations , 1988, POPL '88.