Rumba: An online quality management system for approximate computing

Approximate computing can be employed for an emerging class of applications from various domains such as multimedia, machine learning and computer vision. The approximated output of such applications, even though not 100% numerically correct, is often either useful or the difference is unnoticeable to the end user. This opens up a new design dimension to trade off application performance and energy consumption with output correctness. However, a largely unaddressed challenge is quality control: how to ensure the user experience meets a prescribed level of quality. Current approaches either do not monitor output quality or use sampling approaches to check a small subset of the output assuming that it is representative. While these approaches have been shown to produce average errors that are acceptable, they often miss large errors without any means to take corrective actions. To overcome this challenge, we propose Rumba for online detection and correction of large approximation errors in an approximate accelerator-based computing environment. Rumba employs continuous lightweight checks in the accelerator to detect large approximation errors and then fixes these errors by exact re-computation on the host processor. Rumba employs computationally inexpensive output error prediction models for efficient detection. Computing patterns amenable for approximation (e.g., map and stencil) are usually data parallel in nature and Rumba exploits this property for selective correction. Overall, Rumba is able to achieve 2.1x reduction in output error for an unchecked approximation accelerator while maintaining the accelerator performance gains at the cost of reducing the energy savings from 3.2x to 2.2x for a set of applications from different approximate computing domains.

[1]  Martin Rinard,et al.  Using Code Perforation to Improve Performance, Reduce Energy Consumption, and Respond to Failures , 2009 .

[2]  Song Liu,et al.  Flikker: saving DRAM refresh-power through critical data partitioning , 2011, ASPLOS XVI.

[3]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[4]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[5]  Martin C. Rinard,et al.  Proving acceptability properties of relaxed nondeterministic approximate programs , 2012, PLDI.

[6]  Luis Ceze,et al.  General-purpose code acceleration with limited-precision analog computation , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[7]  Martin C. Rinard Parallel Synchronization-Free Approximate Data Structure Construction , 2013, HotPar.

[8]  Mehrzad Samadi,et al.  CPU-GPU Collaboration for Output Quality Monitoring , 2014 .

[9]  Scott A. Mahlke,et al.  SAGE: Self-tuning approximation for graphics engines , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[10]  Scott A. Mahlke,et al.  Harnessing Soft Computations for Low-Budget Fault Tolerance , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[11]  Martin C. Rinard Probabilistic accuracy bounds for fault-tolerant computations that discard tasks , 2006, ICS '06.

[12]  Jesús Corbal,et al.  Dynamic Tolerance Region Computing for Multimedia , 2012, IEEE Trans. Computers.

[13]  Daniel M. Roy,et al.  Probabilistically Accurate Program Transformations , 2011, SAS.

[14]  Jacob Nelson,et al.  Approximate storage in solid-state memories , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[15]  Donald Yeung,et al.  Exploiting Soft Computing for Increased Fault Tolerance , 2006 .

[16]  Carlos Alvarez Martinez,et al.  Dynamic Tolerance Region Computing for Multimedia , 2012, IEEE Transactions on Computers.

[17]  Alan Edelman,et al.  Language and compiler support for auto-tuning variable-accuracy algorithms , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[18]  Scott A. Mahlke,et al.  Paraprox: pattern-based approximation for data parallel applications , 2014, ASPLOS.

[19]  Martin C. Rinard Using early phase termination to eliminate load imbalances at barrier synchronization points , 2007, OOPSLA.

[20]  Woongki Baek,et al.  Green: a framework for supporting energy-conscious programming using controlled approximation , 2010, PLDI '10.

[21]  Dan Grossman,et al.  EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.

[22]  Martin C. Rinard,et al.  Parallelizing Sequential Programs with Statistical Accuracy Tests , 2013, TECS.

[23]  M. Valero,et al.  Fuzzy memoization for floating-point multimedia applications , 2005, IEEE Transactions on Computers.

[24]  Kaushik Roy,et al.  Quality programmable vector processors for approximate computing , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[25]  Olivier Temam,et al.  Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators , 2014, 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC).

[26]  Kathryn S. McKinley,et al.  Uncertain: a first-order type for uncertain data , 2014, ASPLOS.

[27]  Luis Ceze,et al.  Neural Acceleration for General-Purpose Approximate Programs , 2014, IEEE Micro.

[28]  Amin Ansari,et al.  Encore: Low-cost, fine-grained transient fault recovery , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[29]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[30]  Karthik Pattabiraman,et al.  Error detector placement for soft computation , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[31]  John Sartori,et al.  Branch and Data Herding: Reducing Control and Memory Divergence for Error-Tolerant GPU Applications , 2012, IEEE Transactions on Multimedia.

[32]  Sumit Gulwani,et al.  Proving programs robust , 2011, ESEC/FSE '11.

[33]  Luis Ceze,et al.  Architecture support for disciplined approximate programming , 2012, ASPLOS XVII.

[34]  Henry Hoffmann,et al.  Dynamic knobs for responsive power-aware computing , 2011, ASPLOS XVI.

[35]  Henry Hoffmann,et al.  Patterns and statistical analysis for understanding reduced resource computing , 2010, OOPSLA.

[36]  Somesh Jha,et al.  Static analysis and compiler design for idempotent processing , 2012, PLDI.

[37]  Olivier Temam,et al.  A defect-tolerant accelerator for emerging high-performance applications , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[38]  Martin C. Rinard,et al.  Probabilistic accuracy bounds for perforated programs: a new foundation for program analysis and transformation , 2011, PEPM '11.

[39]  Martin C. Rinard,et al.  Verifying quantitative reliability for programs that execute on unreliable hardware , 2013, OOPSLA.

[40]  Karthikeyan Sankaralingam,et al.  Relax: an architectural framework for software recovery of hardware faults , 2010, ISCA.