Architecture support for disciplined approximate programming

Disciplined approximate programming lets programmers declare which parts of a program can be computed approximately and consequently at a lower energy cost. The compiler proves statically that all approximate computation is properly isolated from precise computation. The hardware is then free to selectively apply approximate storage and approximate computation with no need to perform dynamic correctness checks. In this paper, we propose an efficient mapping of disciplined approximate programming onto hardware. We describe an ISA extension that provides approximate operations and storage, which give the hardware freedom to save energy at the cost of accuracy. We then propose Truffle, a microarchitecture design that efficiently supports the ISA extensions. The basis of our design is dual-voltage operation, with a high voltage for precise operations and a low voltage for approximate operations. The key aspect of the microarchitecture is its dependence on the instruction stream to determine when to use the low voltage. We evaluate the power savings potential of in-order and out-of-order Truffle configurations and explore the resulting quality of service degradation. We evaluate several applications and demonstrate energy savings up to 43%.

[1]  C. Yeh,et al.  Layout techniques supporting the use of dual supply voltages for cell-based designs , 1999, Proceedings 1999 Design Automation Conference (Cat. No. 99CH36361).

[2]  Naresh R. Shanbhag,et al.  Energy-efficient signal processing via algorithmic noise-tolerance , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[3]  Rob A. Rutenbar,et al.  Reducing power by optimizing the necessary precision/range of floating-point arithmetic , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[4]  Ankur Srivastava,et al.  On gate level power optimization using dual-supply voltages , 2001, IEEE Trans. Very Large Scale Integr. Syst..

[5]  Amin Vahdat,et al.  ECOSystem: managing energy as a first class operating system resource , 2002, ASPLOS X.

[6]  Trevor Mudge,et al.  Razor: a low-power pipeline based on circuit-level timing speculation , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[7]  David Zhang,et al.  Secure program execution via dynamic information flow tracking , 2004, ASPLOS XI.

[8]  Margaret Martonosi,et al.  A dynamic compilation framework for controlling microprocessor energy and performance , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[9]  M. Valero,et al.  Fuzzy memoization for floating-point multimedia applications , 2005, IEEE Transactions on Computers.

[10]  Donald Yeung,et al.  Exploiting Soft Computing for Increased Fault Tolerance , 2006 .

[11]  Krishna V. Palem,et al.  Ultra-Efficient (Embedded) SOC Architectures based on Probabilistic CMOS (PCMOS) Technology , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[12]  Vicky Wong,et al.  Soft Error Resilience of Probabilistic Inference Applications , 2006 .

[13]  Christoforos E. Kozyrakis,et al.  Raksha: a flexible information flow architecture for software security , 2007, ISCA '07.

[14]  Norman P. Jouppi,et al.  Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[15]  Frederic T. Chong,et al.  Complete information flow tracking from the gates up , 2009, ASPLOS.

[16]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[17]  K. Sankaralingam,et al.  Exploring the Synergy of Emerging Workloads and Silicon Reliability Trends , 2009 .

[18]  Karthikeyan Sankaralingam,et al.  Relax: an architectural framework for software recovery of hardware faults , 2010, ISCA.

[19]  Timothy Mattson,et al.  A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[20]  Douglas L. Jones,et al.  Scalable stochastic processors , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[21]  Caro Lucas,et al.  Bio-Inspired Imprecise Computational Blocks for Efficient VLSI Implementation of Soft-Computing Applications , 2010, IEEE Transactions on Circuits and Systems I: Regular Papers.

[22]  John Sartori,et al.  Designing a processor from the ground up to allow voltage/reliability tradeoffs , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[23]  Quinn Jacobson,et al.  ERSA: error resilient system architecture for probabilistic applications , 2010, DATE 2010.

[24]  EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI.

[25]  Dan Grossman,et al.  EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.

[26]  Song Liu,et al.  Flikker: saving DRAM refresh-power through critical data partitioning , 2011, ASPLOS XVI.

[27]  Karthikeyan Sankaralingam,et al.  Dark silicon and the end of multicore scaling , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[28]  Andrew A. Chien,et al.  The future of microprocessors , 2011, Commun. ACM.

[29]  Gu-Yeon Wei,et al.  A fully-integrated 3-level DC/DC converter for nanosecond-scale DVS with fast shunt regulation , 2011, 2011 IEEE International Solid-State Circuits Conference.

[30]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.