b-HiVE: A bit-level history-based error model with value correlation for voltage-scaled integer and floating point units

Existing timing error models for voltage-scaled functional units ignore the effect of history and correlation among outputs, and the variation in the error behavior at different bit locations. We propose b-HiVE, a model for voltage-scaling-induced timing errors that incorporates these attributes and demonstrates their impact on the overall model accuracy. On average across several operations, b-HiVE's estimation is within 1-3% of comprehensive analog simulations, which corresponds to 5-17x higher accuracy (6-10x on average) than error models currently used in approximate computing research. To the best of our knowledge, we present the first bit-level error models of arithmetic units, and the first error models for voltage scaling of bitwise logic operations and floating-point units.

[1]  Dan Grossman,et al.  EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.

[2]  Sanjay Pant,et al.  A self-tuning DVS processor using delay-error detection and correction , 2005, IEEE Journal of Solid-State Circuits.

[3]  Babak Falsafi,et al.  An Analysis of Database System Performance on Chip Multiprocessors , 2007 .

[4]  David Blaauw,et al.  Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation , 2003, MICRO.

[5]  Wayne H. Wolf,et al.  Experimental analysis of sequence dependence on energy saving for error tolerant image processing , 2009, ISLPED.

[6]  Nikolaos Hardavellas,et al.  Galaxy: a high-performance energy-efficient multi-chip architecture using photonic interconnects , 2014, ICS '14.

[7]  Ippokratis Pandis,et al.  Data-oriented transaction execution , 2010, Proc. VLDB Endow..

[8]  Nikos Hardavellas,et al.  LaC: Integrating laser control in a photonic interconnect , 2014, 2014 IEEE Photonics Conference.

[9]  Dhiraj K. Pradhan,et al.  Design Automation and Test in Europe (DATE) , 2014 .

[10]  Josep Torrellas,et al.  ReCycle:: pipeline adaptation to tolerate process variation , 2007, ISCA '07.

[11]  Ke Liu,et al.  Hardware Error Rate Characterization with Below-nominal Supply Voltages , 2012 .

[12]  Babak Falsafi,et al.  Near-Optimal Cache Block Placement with Reactive Nonuniform Cache Architectures , 2010, IEEE Micro.

[13]  Meeta Sharma Gupta,et al.  Resilient Architecture Design for Voltage Variation , 2013, Resilient Architecture Design for Voltage Variation.

[14]  Andrew R. Brown,et al.  Increase in the random dopant induced threshold fluctuations and lowering in sub-100 nm MOSFETs due to quantum effects: a 3-D density-gradient simulation study , 2001 .

[15]  Galen C. Hunt,et al.  Vm-based Shared Memory On Low-latency, Remote-memory-access Networks , 1996, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[16]  Yigit Demir High Performance and Energy Efficient Computer System Design Using Photonic Interconnects , 2015 .

[17]  Andreas Gerstlauer,et al.  Multi-level approximate logic synthesis under general error constraints , 2014, 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[18]  Babak Falsafi,et al.  Reactive NUCA: near-optimal block placement and replication in distributed caches , 2009, ISCA '09.

[19]  John Sartori,et al.  Stochastic computing: Embracing errors in architecture and design of processors and applications , 2011, 2011 Proceedings of the 14th International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES).

[20]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[21]  Seda Ogrenci Memik,et al.  Elastic Fidelity: Trading-Off Computational Accuracy for Energy Efficiency , 2014 .

[22]  Yuankai Chen Performance Optimization and Reliability Enhancement in High-Level Synthesis of VLSI Circuits , 2014 .

[23]  Ippokratis Pandis,et al.  Shore-MT: A Quest for Scalability in the Many-Core Era , 2008 .

[24]  Alok N. Choudhary,et al.  The Impact of Dynamic Directories on Multicore Interconnects , 2013, Computer.

[25]  Patrick Chiang,et al.  Lane decoupling for improving the timing-error resiliency of wide-SIMD architectures , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[26]  Babak Falsafi,et al.  R-NUCA: Data Placement in Distributed Shared Caches , 2009 .

[27]  Ieee Staff 2013 IEEE International Symposium on Workload Characterization (IISWC) , 2013 .

[28]  Jing Xin,et al.  Identifying and predicting timing-critical instructions to boost timing speculation , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[29]  Josep Torrellas,et al.  Blueshift: Designing processors for timing speculation from the ground up. , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[30]  Trevor N. Mudge,et al.  Power: A First-Class Architectural Design Constraint , 2001, Computer.

[31]  Hamid Sarbazi-Azad,et al.  On-chip parallel and network-based systems , 2015, Integr..

[32]  Luis Ceze,et al.  Architecture support for disciplined approximate programming , 2012, ASPLOS XVII.

[33]  Babak Falsafi,et al.  Toward Dark Silicon in Servers , 2011, IEEE Micro.

[34]  David J. Brown,et al.  Toward Energy-Efficient Computing , 2010, ACM Queue.

[35]  Robert Bruce Findler,et al.  Exploring circuit timing-aware language and compilation , 2011, ASPLOS XVI.