PPU: A Control Error-Tolerant Processor for Streaming Applications with Formal Guarantees

With increasing technology scaling and design complexity there are increasing threats from device and circuit failures. This is expected to worsen with post-CMOS devices. Current error-resilient solutions ensure reliability of circuits through protection mechanisms such as redundancy, error correction, and recovery. However, the costs of these solutions may be high, rendering them impractical. In contrast, error-tolerant solutions allow errors in the computation and are positioned to be suitable for error-tolerant applications such as media applications. For such programmable error-tolerant processors, the Instruction-Set-Architecture (ISA) no longer serves as a specification since it is acceptable for the processor to allow for errors during the execution of instructions. In this work, we address this specification gap by defining the basic requirements needed for an error-tolerant processor to provide acceptable results. Furthermore, we formally define properties that capture these requirements. Based on this, we propose the Partially Protected Uniprocessor (PPU), an error-tolerant processor that aims to meet these requirements with low-cost microarchitectural support. These protection mechanisms convert potentially fatal control errors to potentially tolerable data errors instead of ensuring instruction-level or byte-level correctness. The protection mechanisms in PPU protect the system against crashes, unresponsiveness, and external device corruption. In addition, they also provide support for achieving acceptable result quality. Additionally, we provide a methodology that formally proves the specification properties on PPU using model checking. This methodology uses models for the hardware and software that are integrated with the fault and recovery models. Finally, we experimentally demonstrate the results of model checking and the application-level quality of results for PPU.

[1]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[2]  장훈,et al.  [서평]「Computer Organization and Design, The Hardware/Software Interface」 , 1997 .

[3]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[4]  Jack B. Dennis,et al.  Virtual memory, processes, and sharing in Multics , 1967, CACM.

[5]  Wei Wu,et al.  Energy-efficient cache design using variable-strength error-correcting codes , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[6]  Tania Stathaki,et al.  Image Fusion: Algorithms and Applications , 2008 .

[7]  Edmund M. Clarke,et al.  State space reduction using partial order techniques , 1999, International Journal on Software Tools for Technology Transfer.

[8]  Martin C. Rinard,et al.  Lax: Driver Interfaces for Approximate Sensor Device Access , 2015, HotOS.

[9]  Shubu Mukherjee,et al.  Architecture Design for Soft Errors , 2008 .

[10]  Martin C. Rinard,et al.  Chisel: reliability- and accuracy-aware optimization of approximate computational kernels , 2014, OOPSLA.

[11]  Jakob Engblom,et al.  The worst-case execution-time problem—overview of methods and survey of tools , 2008, TECS.

[12]  M. D. Giles,et al.  Process Technology Variation , 2011, IEEE Transactions on Electron Devices.

[13]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[14]  Martin C. Rinard,et al.  Verifying quantitative reliability for programs that execute on unreliable hardware , 2013, OOPSLA.

[15]  Dan Grossman,et al.  EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.

[16]  David Blaauw,et al.  Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation , 2003, MICRO.

[17]  Qiang Xu,et al.  Approximate Computing: A Survey , 2016, IEEE Design & Test.

[18]  References , 1971 .

[19]  Junfeng Yang,et al.  Using model checking to find serious file system errors , 2004, TOCS.

[20]  Henry Hoffmann,et al.  Managing performance vs. accuracy trade-offs with loop perforation , 2011, ESEC/FSE '11.

[21]  Luis Ceze,et al.  Neural Acceleration for General-Purpose Approximate Programs , 2014, IEEE Micro.

[22]  Woongki Baek,et al.  Green: a framework for supporting energy-conscious programming using controlled approximation , 2010, PLDI '10.

[23]  Naresh R. Shanbhag,et al.  Energy-efficient signal processing via algorithmic noise-tolerance , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[24]  Andrea C. Arpaci-Dusseau,et al.  Analysis and Evolution of Journaling File Systems , 2005, USENIX Annual Technical Conference, General Track.

[25]  Naresh R. Shanbhag,et al.  Soft N-Modular Redundancy , 2012, IEEE Transactions on Computers.

[26]  Jack B. Dennis,et al.  Segmentation and the Design of Multiprogrammed Computer Systems , 1965, JACM.

[27]  Joseph Sifakis,et al.  Model checking , 1996, Handbook of Automated Reasoning.

[28]  Steven M. Nowick,et al.  ACM Journal on Emerging Technologies in Computing Systems , 2010, TODE.

[29]  Sharad Malik,et al.  Extracting useful computation from error-prone processors for streaming applications , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[30]  Albert Meixner,et al.  Argus: Low-Cost, Comprehensive Error Detection in Simple Cores , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[31]  Song Liu,et al.  Flikker: saving DRAM refresh-power through critical data partitioning , 2011, ASPLOS XVI.

[32]  Sparsh Mittal,et al.  A Survey of Techniques for Approximate Computing , 2016, ACM Comput. Surv..

[33]  Fausto Giunchiglia,et al.  NUSMV: a new symbolic model checker , 2000, International Journal on Software Tools for Technology Transfer.

[34]  Sharad Malik,et al.  CommGuard: Mitigating Communication Errors in Error-Prone Parallel Execution , 2015, ASPLOS.

[35]  Luis Ceze,et al.  Architecture support for disciplined approximate programming , 2012, ASPLOS XVII.

[36]  Gary S. Tyson,et al.  Guaranteeing Hits to Improve the Efficiency of a Small Instruction Cache , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[37]  Sharad Malik,et al.  Error-tolerant processors: Formal specification and verification , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[38]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[39]  Subhasish Mitra,et al.  ERSA: Error Resilient System Architecture for probabilistic applications , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[40]  Douglas L. Jones,et al.  Scalable stochastic processors , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[41]  Matthew J. Parkinson,et al.  Uniqueness and reference immutability for safe parallelism , 2012, OOPSLA '12.

[42]  Andrew W. Appel,et al.  Modern Compiler Implementation in ML , 1997 .

[43]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.