Smart compilers for reliable and power-efficient embedded computing

Thanks to continuous technology scaling, intelligent, fast and smaller digital systems are now available at affordable costs. As a result, digital systems have found use in a wide range of application areas that were not even imagined before, including medical (e.g., MRI, remote or post-operative monitoring devices, etc.), automotive (e.g., adaptive cruise control, anti-lock brakes, etc.), security systems (e.g., residential security gateways, surveillance devices, etc.), and in- and out-of-body sensing (e.g., capsule swallowed by patients measuring digestive system pH, heart monitors, etc.). Such computing systems, which are completely embedded within the application, are called embedded systems, as opposed to general purpose computing systems. In the design of such embedded systems, power consumption and reliability are indispensable system requirements. In battery operated portable devices, the battery is the single largest factor contributing to device cost, weight, recharging time, frequency and ultimately its usability. For example, in the Apple iPhone 4 smart-phone, the battery is 40% of the device weight, occupies 36% of its volume and allows only 7 hours (over 3G) of talk time. As embedded systems find use in a range of sensitive applications, from bio-medical applications to safety and security systems, the reliability of the computations performed becomes a crucial factor. At our current technology-node, portable embedded systems are prone to expect failures due to soft errors at the rate of once-per-year; but with aggressive technology scaling, the rate is predicted to increase exponentially to once-per-hour. Over the years, researchers have been successful in developing techniques, implemented at different layers of the design-spectrum, to improve system power efficiency and reliability. Among the layers of design abstraction, I observe that the interface between the compiler and processor micro-architecture possesses a unique potential for efficient design optimizations. A compiler designer is able to observe and analyze the application software at a finer granularity; while the processor architect analyzes the system output (power, performance, etc.) for each executed instruction. At the compiler micro-architecture interface, if the system knowledge at the two design layers can be integrated, design optimizations at the two layers can be modified to efficiently utilize available resources and thereby achieve appreciable system-level benefits. To this effect, the thesis statement is that, “by merging system design information at the compiler and micro-architecture design layers, smart compilers can be developed, that achieve reliable and power-efficient embedded computing through: (i) Pure compiler techniques, (ii) Hybrid compiler micro-architecture techniques, and (iii) Compiler-aware architectures”. In this dissertation demonstrates, through contributions in each of the three compiler-based techniques, the effectiveness of smart compilers in achieving power-efficiency and reliability in embedded systems.

[1]  Sharad Malik,et al.  Cache miss equations: a compiler framework for analyzing and tuning memory behavior , 1999, TOPL.

[2]  Aviral Shrivastava,et al.  Partially Protected Caches to Reduce Failures Due to Soft Errors in Multimedia Applications , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[3]  Srilatha Manne Low Power TLB Design for High Performance Microprocessors , 1997 .

[4]  Heonshik Shin,et al.  Dynamic scratchpad memory management for code in portable systems with an MMU , 2008, TECS.

[5]  Wei Zhang,et al.  Computing and Minimizing Cache Vulnerability to Transient Errors , 2009, IEEE Design & Test of Computers.

[6]  N. Seifert,et al.  Robust system design with built-in soft-error resilience , 2005, Computer.

[7]  Johan Karlsson,et al.  On latching probability of particle induced transients in combinational networks , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[8]  Yunheung Paek,et al.  A spatial mapping algorithm for heterogeneous coarse-grained reconfigurable architectures , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[9]  Chau-Wen Tseng,et al.  Compiler optimizations for improving data locality , 1994, ASPLOS VI.

[10]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[11]  Aviral Shrivastava,et al.  Operation tables for scheduling in the presence of incomplete bypassing , 2004, CODES+ISSS '04.

[12]  Hiroyuki Tomiyama,et al.  Optimal code placement of embedded software for instruction caches , 1996, Proceedings ED&TC European Design and Test Conference.

[13]  David I. August,et al.  SWIFT: software implemented fault tolerance , 2005, International Symposium on Code Generation and Optimization.

[14]  L.T. Clark,et al.  A low-power 2.5-GHz 90-nm level 1 cache and memory management unit , 2005, IEEE Journal of Solid-State Circuits.

[15]  Shubhendu S. Mukherjee,et al.  Measuring Architectural Vulnerability Factors , 2003, IEEE Micro.

[16]  Jürgen Teich,et al.  Mapping of regular nested loop programs to coarse-grained reconfigurable arrays - constraints and methodology , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[17]  Jürgen Becker,et al.  Architecture, memory and interface technology integration of an industrial/ academic configurable system-on-chip (CSoC) , 2003, IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings..

[18]  William Pugh,et al.  The Omega test: A fast and practical integer programming algorithm for dependence analysis , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[19]  Aviral Shrivastava,et al.  Static analysis to mitigate soft errors in register files , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[20]  Seh-Woong Jeong,et al.  A Low Power TLB Structure for Embedded Systems , 2002, IEEE Computer Architecture Letters.

[21]  Edward J. McCluskey,et al.  ED4I: Error Detection by Diverse Data and Duplicated Instructions , 2002, IEEE Trans. Computers.

[22]  John L. Henning SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.

[23]  Gernot Heiser,et al.  An Analysis of Power Consumption in a Smartphone , 2010, USENIX Annual Technical Conference.

[24]  Jr. Leonard R. Rockett Simulated SEU hardened scaled CMOS SRAM cell design using gated resistors , 1992 .

[25]  Sudhakar M. Reddy,et al.  Cache size selection for performance, energy and reliability of time-constrained systems , 2006, Asia and South Pacific Conference on Design Automation, 2006..

[26]  Rudy Lauwereins,et al.  DRESC: a retargetable compiler for coarse-grained reconfigurable architectures , 2002, 2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings..

[27]  Jeffrey T. Draper,et al.  Critical Charge Characterization for Soft Error Rate Modeling in 90nm SRAM , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[28]  Shuichi Sakai,et al.  Mitigating soft errors in highly associative cache with CAM-based tag , 2005, 2005 International Conference on Computer Design.

[29]  Reiner W. Hartenstein,et al.  A decade of reconfigurable computing: a visionary retrospective , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[30]  Mahmut T. Kandemir,et al.  Optimizing instruction TLB energy using software and hardware techniques , 2005, TODE.

[31]  Shuguang Feng,et al.  Cost-efficient soft error protection for embedded microprocessors , 2006, CASES '06.

[32]  Mahmut T. Kandemir,et al.  Compiler-directed physical address generation for reducing dTLB power , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[33]  Siddhartha Chatterjee,et al.  Exact analysis of the cache behavior of nested loops , 2001, PLDI '01.

[34]  Takao Waho,et al.  Novel resonant-tunneling multiple-threshold logic circuit based on switching sequence detection , 2000, Proceedings 30th IEEE International Symposium on Multiple-Valued Logic (ISMVL 2000).

[35]  Kevin Skadron,et al.  Power-aware branch prediction: characterization and design , 2004, IEEE Transactions on Computers.

[36]  Aviral Shrivastava,et al.  Mitigating soft error failures for multimedia applications by selective data protection , 2006, CASES '06.

[37]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[38]  Y. Savaria,et al.  Aggressive leakage reduction of SRAMs using error checking and correcting (ECC) techniques , 2008, 2008 51st Midwest Symposium on Circuits and Systems.

[39]  R. Baumann,et al.  Boron compounds as a dominant source of alpha particles in semiconductor devices , 1995, Proceedings of 1995 IEEE International Reliability Physics Symposium.

[40]  E. Ibe,et al.  Impact of Scaling on Neutron-Induced Soft Error in SRAMs From a 250 nm to a 22 nm Design Rule , 2010, IEEE Transactions on Electron Devices.

[41]  Rudy Lauwereins,et al.  Design methodology for a tightly coupled VLIW/reconfigurable matrix architecture: a case study , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[42]  Mahmut T. Kandemir,et al.  Compiler-directed array interleaving for reducing energy in multi-bank memories , 2002, Proceedings of ASP-DAC/VLSI Design 2002. 7th Asia and South Pacific Design Automation Conference and 15h International Conference on VLSI Design.

[43]  Mahmut T. Kandemir,et al.  Soft error and energy consumption interactions: a data cache perspective , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[44]  Jiri Gaisler Evaluation of a 32-bit microprocessor with built-in concurrent error-detection , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[45]  Aviral Shrivastava,et al.  Partitioning techniques for partially protected caches in resource-constrained embedded systems , 2010, TODE.

[46]  Yen-Jen Chang An Ultra Low-Power TLB Design , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[47]  Maurizio Patrignani,et al.  The Complexity of the Matching-Cut Problem , 2001, WG.

[48]  T. May,et al.  Alpha-particle-induced soft errors in dynamic memories , 1979, IEEE Transactions on Electron Devices.

[49]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[50]  Seyed Ghassem Miremadi,et al.  Joint write policy and fault-tolerance mechanism selection for caches in DSM technologies: Energy-reliability trade-off , 2009, 2009 10th International Symposium on Quality Electronic Design.

[51]  Peter Petrov,et al.  Energy-efficient physically tagged caches for embedded processors with virtual memory , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[52]  Shin-Dug Kim,et al.  A selective filter-bank TLB system , 2003, ISLPED '03.

[53]  Hoi-Jun Yoo,et al.  Dynamic Voltage and Frequency Scaling (DVFS) scheme for multi-domains power management , 2007, 2007 IEEE Asian Solid-State Circuits Conference.

[54]  E. Cannon,et al.  SRAM SER in 90, 130 and 180 nm bulk and SOI technologies , 2004, 2004 IEEE International Reliability Physics Symposium. Proceedings.

[55]  Giovanni De Micheli,et al.  Complex instruction and software library mapping for embedded software using symbolic algebra , 2003, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[56]  Liming Chen,et al.  N-VERSION PROGRAMMINC: A FAULT-TOLERANCE APPROACH TO RELlABlLlTY OF SOFTWARE OPERATlON , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[57]  Dhiraj K. Pradhan,et al.  Fault-tolerant computer system design , 1996 .

[58]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[59]  Kathryn S. McKinley,et al.  Dynamic code management: improving whole program code locality in managed runtimes , 2006, VEE '06.

[60]  Ed Anderson,et al.  LAPACK Users' Guide , 1995 .

[61]  Lawrence T. Clark,et al.  Reducing translation lookaside buffer active power , 2003, ISLPED '03.

[62]  Jin-Fu Li,et al.  An error detection and correction scheme for RAMs with partial-write function , 2005, 2005 IEEE International Workshop on Memory Technology, Design, and Testing (MTDT'05).

[63]  Fadi J. Kurdahi,et al.  MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications , 2000, IEEE Trans. Computers.

[64]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[65]  Xiangrong Zhou,et al.  Low-power cache organization through selective tag translation for embedded processors with virtual memory support , 2006, GLSVLSI '06.

[66]  Tryggve Fossum,et al.  Cache scrubbing in microprocessors: myth or necessity? , 2004, 10th IEEE Pacific Rim International Symposium on Dependable Computing, 2004. Proceedings..

[67]  Wei Zhang,et al.  Compiler-guided register reliability improvement against soft errors , 2005, EMSOFT.

[68]  Ivan Hal Sudborough,et al.  Area efficient layouts of binary trees in grids , 2001 .

[69]  Sammy Kayali Reliability consideration for advanced microelectronics , 2000, Proceedings. 2000 Pacific Rim International Symposium on Dependable Computing.

[70]  William Pugh,et al.  Counting solutions to Presburger formulas: how and why , 1994, PLDI '94.

[71]  J. Robertson High dielectric constant gate oxides for metal oxide Si transistors , 2006 .

[72]  Vincent Loechner,et al.  Counting Integer Points in Parametric Polytopes Using Barvinok's Rational Functions , 2007, Algorithmica.

[73]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[74]  Kiyoung Choi,et al.  Resource sharing and pipelining in coarse-grained reconfigurable architecture for domain-specific optimization , 2005, Design, Automation and Test in Europe.

[75]  B.C. Paul,et al.  Process variation in embedded memories: failure analysis and variation aware architecture , 2005, IEEE Journal of Solid-State Circuits.

[76]  Michael S. Hsiao,et al.  Compiler-directed dynamic voltage/frequency scheduling for energy reduction in microprocessors , 2001, ISLPED '01.

[77]  G. Chen,et al.  Compiler-directed selective data protection against soft errors , 2005, ASP-DAC '05.

[78]  Mahmut T. Kandemir,et al.  Compiler-directed code restructuring for reducing data TLB energy , 2004, International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004..

[79]  Margaret Martonosi,et al.  Dynamic thermal management for high-performance microprocessors , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[80]  Sharad Malik,et al.  Cache miss equations: an analytical representation of cache misses , 1997, ICS '97.

[81]  Kunle Olukotun,et al.  REMARC (abstract): reconfigurable multimedia array coprocessor , 1998, FPGA '98.

[82]  Scott A. Mahlke,et al.  Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures , 2006, CASES '06.

[83]  Nikil D. Dutt,et al.  FORAY-GEN: automatic generation of affine functions for memory optimizations , 2005, Design, Automation and Test in Europe.

[84]  Mehdi Baradaran Tahoori,et al.  Reducing Data Cache Susceptibility to Soft Errors , 2006, IEEE Transactions on Dependable and Secure Computing.

[85]  Hiroyuki Tomiyama,et al.  Code placement techniques for cache miss rate reduction , 1997, TODE.

[86]  Vivek Sarkar,et al.  A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness , 2010, CASCON.

[87]  Per Stenström,et al.  TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors , 2002, ISLPED '02.

[88]  Fadi J. Kurdahi,et al.  A compiler framework for mapping applications to a coarse-grained reconfigurable computer architecture , 2001, CASES '01.

[89]  Sharad Malik,et al.  A Survey of Optimization Techniques Targeting Low Power VLSI Circuits , 1995, 32nd Design Automation Conference.

[90]  T. N. Vijaykumar,et al.  Opportunistic Transient-Fault Detection , 2005, ISCA 2005.

[91]  Aviral Shrivastava,et al.  Compilation techniques for energy reduction in horizontally partitioned cache architectures , 2005, CASES '05.