Stress-Aware Loops Mapping on CGRAs with Dynamic Multi-Map Reconfiguration

With VLSI process technology scaling into nano-scale, the increasingly serious aging issues (e.g., NBTI and HCI aging effects) have brought a significant threat to system reliability. Coarse-grained reconfigurable architectures (CGRAs) exhibit the feature to reconfigure and execute different mapping schemes (Maps) dynamically, compensating for each other to mitigate aging issues effectively. In this paper, a two-stage stress-aware loops mapping algorithm is first proposed for the CGRA-mapped designs by jointing the intra-kernel and inter-kernel stress optimizations. With pipelining techniques, the intra-kernel stress optimization employs the stress-aware force-directed and effective MCC (Maximal Compatibility Class) methods to optimize operations’ placement and mapping distribution on processing elements (PEs), which helps to avoid overmany operations to be mapped on the same PEs and reduce the accumulated stresses. By leveraging the dynamic reconfiguration feature, the inter-kernel stress optimization develops a multi-map scheduling method to reconfigure a set of ordered maps on CGRA dynamically, which diversifies the PEs’ usage and compensates for the stresses on different PEs among them. Experimental results show that our approach can reduce the maximum stress by 82.0% for NBTI and 70.4% for HCI, and improve the aging efficiency by 6.01X and MTTF by 3.16X averagely, while keeping the optimized performance.

[1]  Michael Glaß,et al.  Stress-Aware Module Placement on Reconfigurable Devices , 2011, 2011 21st International Conference on Field Programmable Logic and Applications.

[2]  Wenjie Wang,et al.  A reconfigurable multi-processor SoC for media applications , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[3]  G. Levi A note on the derivation of maximal common subgraphs of two directed or undirected graphs , 1973 .

[4]  A. Parimala,et al.  MuCCRA chips: Configurable dynamically-reconfigurable processors , 2007, 2007 IEEE Asian Solid-State Circuits Conference.

[5]  Ulf Schlichtmann,et al.  Aging analysis of circuit timing considering NBTI and HCI , 2009, 2009 15th IEEE International On-Line Testing Symposium.

[6]  Josep Torrellas,et al.  Facelift: Hiding and slowing down aging in multicores , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[7]  Henk Corporaal,et al.  Coarse grained reconfigurable architectures in the past 25 years: Overview and classification , 2016, 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS).

[8]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[9]  Xiaoqing Wen,et al.  On estimation of NBTI-Induced delay degradation , 2010, 2010 15th IEEE European Test Symposium.

[10]  Pierre G. Paulin,et al.  Force-directed scheduling for the behavioral synthesis of ASICs , 1989, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[11]  Bowei Xi,et al.  A smart hill-climbing algorithm for application server configuration , 2004, WWW '04.

[12]  D. Schroder,et al.  Negative bias temperature instability: Road to cross in deep submicron silicon semiconductor manufacturing , 2003 .

[13]  Karthikeyan Sankaralingam,et al.  Stream-dataflow acceleration , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[14]  S. P. Park,et al.  Estimation of statistical variation in temporal NBTI degradation and its impact on lifetime circuit performance , 2007, ICCAD 2007.

[15]  Jörg Henkel,et al.  STRAP: Stress-aware placement for aging mitigation in runtime reconfigurable architectures , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[16]  Peter Y. K. Cheung,et al.  Improving FPGA Reliability with Wear-Levelling , 2011, 2011 21st International Conference on Field Programmable Logic and Applications.

[17]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[18]  Bjorn De Sutter,et al.  Architecture Enhancements for the ADRES Coarse-Grained Reconfigurable Array , 2008, HiPEAC.

[19]  Scott A. Mahlke,et al.  Edge-centric modulo scheduling for coarse-grained reconfigurable architectures , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[20]  J. Paul Tremblay,et al.  Discrete Mathematical Structures with Applications to Computer Science , 1975 .

[21]  Peter Y. K. Cheung,et al.  Degradation in FPGAs: measurement and modelling , 2010, FPGA '10.

[22]  Aviral Shrivastava,et al.  EPIMap: Using Epimorphism to map applications on CGRAs , 2012, DAC Design Automation Conference 2012.

[23]  Vicki H. Allan,et al.  Software pipelining , 1995, CSUR.

[24]  Pasi Liljeberg,et al.  Smart hill climbing for agile dynamic mapping in many-core systems , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[25]  Jörg Henkel,et al.  Module diversification: Fault tolerance and aging mitigation for runtime reconfigurable architectures , 2013, 2013 IEEE International Test Conference (ITC).

[26]  Abdulazim Amouri,et al.  High-level aging estimation for FPGA-mapped designs , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).

[27]  YU Xian-zhi A New Algorithm for Maximal Compatible Classes Based on Matrix , 2010 .

[28]  Victor Y. Chen,et al.  SimRPU: A Simulation Environment for Reconfigurable Architecture Exploration , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[29]  Anupam Chattopadhyay,et al.  Force-directed scheduling for Data Flow Graph mapping on Coarse-Grained Reconfigurable Architectures , 2014, 2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14).

[30]  Aviral Shrivastava,et al.  REGIMap: Register-aware application mapping on Coarse-Grained Reconfigurable Architectures (CGRAs) , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[31]  Rudy Lauwereins,et al.  Exploiting Loop-Level Parallelism on Coarse-Grained Reconfigurable Architectures Using Modulo Scheduling , 2003, DATE.

[32]  Yu Cao,et al.  Modeling and minimization of PMOS NBTI effect for robust nanometer design , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[33]  Kevin Skadron,et al.  HotSpot: a compact thermal modeling methodology for early-stage VLSI design , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.