Massively Parallel Processor Architectures for Resource-aware Computing

We present a class of massively parallel processor architectures called invasive tightly coupled processor arrays (TCPAs). The presented processor class is a highly parame- terizable template, which can be tailored before runtime to fulfill costumers' requirements such as performance, area cost, and energy efficiency. These programmable accelerators are well suited for domain-specific computing from the areas of signal, image, and video processing as well as other streaming processing applications. To overcome future scaling issues (e.g., power con- sumption, reliability, resource management, as well as application parallelization and mapping), TCPAs are inherently designed in a way to support self-adaptivity and resource awareness at hardware level. Here, we follow a recently introduced resource- aware parallel computing paradigm called invasive computing where an application can dynamically claim, execute, and release resources. Furthermore, we show how invasive computing can be used as an enabler for power management. Finally, we will introduce ideas on how to realize fault-tolerant loop execution on such massively parallel architectures through employing on- demand spatial redundancies at the processor array level. I. INTRODUCTION The steady miniaturization of feature sizes allows to create increasingly complex Multi-Processor System-on-Chip (MP- SoC) architectures but raises also numerous questions. These challenges include imperfections and unreliability of the de- vices as well as scalability problems of the architectures, as for instance, how an optimal communication topology or memory architecture should look like. The situation is even more severe with respect to power consumption because chips can handle only a limited power budget—but technology shrinking leads also to higher energy densities continuously. As a consequence, the potentially available chip area might not be fully utilized or at least not simultaneously. These phenomena are also known as power wall and utilization wall (1). Other scalability issues, caused by the sheer complexity of exponential growth, are related to resource management as well as parallelization and mapping approaches. This leads to the following conclusion: Future systems will only scale if the mapping and runtime methods will considerably improve—this reasoning holds for both embedded and portable devices such as smartphones and tablets as well as large scale systems as used for high- performance computing. Customization and heterogeneity in the form of domain-specific components such as accelerators are the key to success for future performance gains (2).

[1]  Jürgen Teich,et al.  Hierarchical power management for adaptive tightly-coupled processor arrays , 2013, TODE.

[2]  Narayanan Vijaykrishnan,et al.  Exploiting Heterogeneity for Energy Efficiency in Chip Multiprocessors , 2011, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[3]  Jürgen Teich,et al.  A highly parameterizable parallel processor array architecture , 2006, 2006 IEEE International Conference on Field Programmable Technology.

[4]  Kiyoung Choi,et al.  Thermal-aware fault-tolerant system design with coarse-grained reconfigurable array architecture , 2010, 2010 NASA/ESA Conference on Adaptive Hardware and Systems.

[5]  Mahmut T. Kandemir,et al.  Compiler-assisted soft error detection under performance and energy constraints in embedded systems , 2009, TECS.

[6]  Scott Mahlke,et al.  Efficient soft error protection for commodity embedded microprocessors using profile information , 2012, LCTES 2012.

[7]  Jürgen Teich,et al.  Scalable Many-Domain Power Gating in Coarse-Grained Reconfigurable Processor Arrays , 2011, IEEE Embedded Systems Letters.

[8]  Simha Sethumadhavan,et al.  Distributed Microarchitectural Protocols in the TRIPS Prototype Processor , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[9]  Steven Swanson,et al.  Conservation cores: reducing the energy of mature computations , 2010, ASPLOS XV.

[10]  David I. August,et al.  SWIFT: software implemented fault tolerance , 2005, International Symposium on Code Generation and Optimization.

[11]  Michael Nicolaidis Time redundancy based soft-error tolerance to rescue nanometer technologies , 1999, Proceedings 17th IEEE VLSI Test Symposium (Cat. No.PR00146).

[12]  Frank Hannig,et al.  Invasive Tightly-Coupled Processor Arrays , 2014, ACM Trans. Embed. Comput. Syst..

[13]  Gerald H. Hilderink,et al.  Parallel Processing — the picoChip way! , 2003 .

[14]  Yunheung Paek,et al.  Selective validations for efficient protections on Coarse-Grained Reconfigurable Architectures , 2013, 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors.

[15]  Jürgen Teich,et al.  Decentralized dynamic resource management support for massively parallel processor arrays , 2011, ASAP 2011 - 22nd IEEE International Conference on Application-specific Systems, Architectures and Processors.

[16]  Jürgen Teich,et al.  System integration of tightly-coupled processor arrays using reconfigurable buffer structures , 2013, CF '13.

[17]  Jürgen Teich,et al.  Exploitation of Quality/Throughput Tradeoffs in Image Processing through Invasive Computing , 2013, PARCO.

[18]  Jürgen Teich,et al.  Invasive Algorithms and Architectures Invasive Algorithmen und Architekturen , 2008, it Inf. Technol..

[19]  Ming Zhang,et al.  Combinational Logic Soft Error Correction , 2006, 2006 IEEE International Test Conference.

[20]  Tommy Kuhn,et al.  Low-Cost TMR for Fault-Tolerance on Coarse-Grained Reconfigurable Architectures , 2011, 2011 International Conference on Reconfigurable Computing and FPGAs.