Approximate computing: An integrated cross-layer framework

Venkataramani, Swagath PhD, Purdue University, December 2016. Approximate Computing: An Integrated Cross-layer Framework. Major Professor: Anand Raghunathan. We have witnessed a fundamental shift in the nature of workloads executed by computing platforms across the spectrum, from mobile and deeply-embedded devices to servers and data centers. Increasingly, computing platforms need to analyze, organize and search through large amounts of real-world data, intelligently interact with the physical world, be context-aware, and present more natural human interfaces. These tasks do not involve the computation of a golden answer or unique numerical result. Instead, they need to produce outputs that are good-enough or of sufficient quality. Such workloads possess intrinsic application resilience, or the ability to produce outputs of acceptable quality even when a large fraction of their computations are performed in an imprecise or approximate manner. Intrinsic application resilience offers an entirely new dimension along which computing platforms can be optimized. However, the design of computing platforms still continues to be guided by the dogma that every computation must be executed with the same strict notion of correctness. With the demand for computing performance growing unabated on the one hand, while traditional benefits due to technology scaling diminish on the other, it is important to leverage this new source of efficiency. A new design approach, called approximate computing (AxC), leverages the flexibility provided by intrinsic application resilience to realize hardware or software implementations that are more efficient in energy or performance. Approximate computing techniques forsake exact (numerical or Boolean) equivalence in the execution of some of the application’s computations, while ensuring that the output quality is

[1]  Kaushik Roy,et al.  Energy-efficient recognition and mining processor using scalable effort design , 2013, Proceedings of the IEEE 2013 Custom Integrated Circuits Conference.

[2]  Luis Ceze,et al.  Architecture support for disciplined approximate programming , 2012, ASPLOS XVII.

[3]  David Bañeres,et al.  Variable-latency design by function speculation , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[4]  Kaushik Roy,et al.  Beyond charge-based computation: Boolean and non-Boolean computing with spin torque devices , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[5]  Naresh R. Shanbhag,et al.  Performance analysis of algorithmic noise-tolerance techniques , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[6]  Kaushik Roy,et al.  Design of voltage-scalable meta-functions for approximate computing , 2011, 2011 Design, Automation & Test in Europe.

[7]  Berin Martini,et al.  NeuFlow: A runtime reconfigurable dataflow processor for vision , 2011, CVPR 2011 WORKSHOPS.

[8]  Deming Chen,et al.  CCP: common case promotion for improved timing error resilience with energy efficiency , 2012, ISLPED '12.

[9]  Naresh R. Shanbhag,et al.  Energy-efficiency bounds for deep submicron VLSI systems in the presence of noise , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[10]  Naresh R. Shanbhag,et al.  Error-Resilient Motion Estimation Architecture , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[11]  Jim D. Garside,et al.  SpiNNaker: A multi-core System-on-Chip for massively-parallel neural net simulation , 2012, Proceedings of the IEEE 2012 Custom Integrated Circuits Conference.

[12]  Shih-Lien Lu Speeding Up Processing with Approximation Circuits , 2004, Computer.

[13]  P. R. Stephan,et al.  SIS : A System for Sequential Circuit Synthesis , 1992 .

[14]  Sandeep K. Gupta,et al.  A Re-design Technique for Datapath Modules in Error Tolerant Applications , 2008, 2008 17th Asian Test Symposium.

[15]  Puneet Gupta,et al.  Trading Accuracy for Power with an Underdesigned Multiplier Architecture , 2011, 2011 24th Internatioal Conference on VLSI Design.

[16]  Kaushik Roy,et al.  Substitute-and-simplify: A unified design paradigm for approximate and quality configurable circuits , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[17]  Zhi-Hui Kong,et al.  Design of Low-Power High-Speed Truncation-Error-Tolerant Adder and Its Application in Digital Signal Processing , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[18]  Sandeep K. Gupta,et al.  Approximate logic synthesis for error tolerant applications , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[19]  Yong Liu,et al.  Specifications of Nanoscale Devices and Circuits for Neuromorphic Computational Systems , 2013, IEEE Transactions on Electron Devices.

[20]  Kaushik Roy,et al.  Managing the Quality vs. Efficiency Trade-off Using Dynamic Effort Scaling , 2013, TECS.

[21]  Naresh R. Shanbhag,et al.  Noise-tolerant dynamic circuit design , 1999, ISCAS'99. Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat. No.99CH36349).

[22]  Olivier Temam The rebirth of neural networks , 2010, ISCA '10.

[23]  Kaushik Roy,et al.  Quality programmable vector processors for approximate computing , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[24]  Kaushik Roy,et al.  Scalable effort hardware design: Exploiting algorithmic resilience for energy efficiency , 2010, Design Automation Conference.

[25]  Wei Yang Lu,et al.  Nanoscale memristor device as synapse in neuromorphic systems. , 2010, Nano letters.

[26]  Krishna V. Palem,et al.  Energy aware computing through probabilistic switching: a study of limits , 2005, IEEE Transactions on Computers.

[27]  Melvin A. Breuer,et al.  Basing Acceptable Error-Tolerant Performance on Significance-Based Error-rate (SBER) , 2008, 26th IEEE VLSI Test Symposium (vts 2008).

[28]  Quinn Jacobson,et al.  ERSA: error resilient system architecture for probabilistic applications , 2010, DATE 2010.

[29]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[30]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[31]  Lingamneni Avinash,et al.  Highly energy and performance efficient embedded computing through approximately correct arithmetic: a mathematical foundation and preliminary experimental validation , 2008, CASES '08.

[32]  Ilia Polian,et al.  Adaptive voltage over-scaling for resilient applications , 2011, 2011 Design, Automation & Test in Europe.

[33]  Surendra Byna,et al.  Best-effort semantic document search on GPUs , 2010, GPGPU-3.

[34]  Jason Cong,et al.  Assuring application-level correctness against soft errors , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[35]  Melvin A. Breuer,et al.  Defect and error tolerance in the presence of massive numbers of defects , 2004, IEEE Design & Test of Computers.

[36]  Igor L. Markov,et al.  Logic synthesis and circuit customization using extensive external don't-cares , 2010, TODE.

[37]  João Gama,et al.  Cascade Generalization , 2000, Machine Learning.

[38]  Kaushik Roy,et al.  CRISTA: A New Paradigm for Low-Power, Variation-Tolerant, and Adaptive Circuit Synthesis Using Critical Path Isolation , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[39]  Melvin A. Breuer,et al.  Multi-media applications and imprecise computation , 2005, 8th Euromicro Conference on Digital System Design (DSD'05).

[40]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[41]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[42]  Paul A. Viola,et al.  Multiple-Instance Pruning For Learning Efficient Cascade Detectors , 2007, NIPS.

[43]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[44]  Martin C. Rinard,et al.  Verifying quantitative reliability for programs that execute on unreliable hardware , 2013, OOPSLA.

[45]  Giovanni De Micheli,et al.  Synthesis and Optimization of Digital Circuits , 1994 .

[46]  Lingamneni Avinash,et al.  Sustaining moore's law in embedded computing through probabilistic and approximate design: retrospects and prospects , 2009, CASES '09.

[47]  Douglas L. Jones,et al.  Scalable stochastic processors , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[48]  Dan Grossman,et al.  EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.

[49]  Ku He,et al.  Modeling and synthesis of quality-energy optimal approximate adders , 2012, 2012 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[50]  P. K. Dubey,et al.  Recognition, Mining and Synthesis Moves Comp uters to the Era of Tera , 2005 .

[51]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[52]  Martin C. Rinard,et al.  Verified integrity properties for safe approximate program transformations , 2013, PEPM '13.

[53]  Kaushik Roy,et al.  SALSA: Systematic logic synthesis of approximate circuits , 2012, DAC Design Automation Conference 2012.

[54]  Woongki Baek,et al.  Green: a framework for supporting energy-conscious programming using controlled approximation , 2010, PLDI '10.

[55]  Naresh R. Shanbhag,et al.  Energy-efficient signal processing via algorithmic noise-tolerance , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[56]  Kaushik Roy,et al.  Significance driven computation: a voltage-scalable, variation-aware, quality-tuning motion estimator , 2009, ISLPED.

[57]  Krishna V. Palem,et al.  Computational Proof as Experiment: Probabilistic Algorithms from a Thermodynamic Perspective , 2003, Verification: Theory and Practice.

[58]  Krishna V. Palem,et al.  Probabilistic arithmetic and energy efficient embedded signal processing , 2006, CASES '06.

[59]  Malgorzata Marek-Sadowska,et al.  Perturb and Simplify: Optimizing Combinational Circuits with External Don't Cares , 1996, ED&TC.

[60]  Sanu Mathew,et al.  A 1.45GHz 52-to-162GFLOPS/W variable-precision floating-point fused multiply-add unit with certainty tracking in 32nm CMOS , 2012, 2012 IEEE International Solid-State Circuits Conference.

[61]  Kaushik Roy,et al.  IMPACT: IMPrecise adders for low-power approximate computing , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[62]  Henry Hoffmann,et al.  Managing performance vs. accuracy trade-offs with loop perforation , 2011, ESEC/FSE '11.

[63]  Andrew B. Kahng,et al.  Accuracy-configurable adder for approximate arithmetic designs , 2012, DAC Design Automation Conference 2012.

[64]  Pradeep Dubey,et al.  Convergence of Recognition, Mining, and Synthesis Workloads and Its Implications , 2008, Proceedings of the IEEE.

[65]  Y-Lan Boureau,et al.  Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.

[66]  Scott B. Baden,et al.  Accelerating Viola-Jones Face Detection to FPGA-Level Using GPUs , 2010, 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines.

[67]  Naresh R. Shanbhag,et al.  A low-power digital filter IC via soft DSP , 2001, Proceedings of the IEEE 2001 Custom Integrated Circuits Conference (Cat. No.01CH37169).

[68]  Krishna V. Palem,et al.  Ultra-low energy computing with noise: Energy performance probability , 2006, IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures (ISVLSI'06).

[69]  Pinar KorkmazandKrishna Probabilistic CMOS Technology: A Survey and Future Directions , 2006 .

[70]  Anand Raghunathan,et al.  Best-effort parallel execution framework for Recognition and mining applications , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[71]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[72]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[73]  Melvin A. Breuer Determining error rate in error tolerant VLSI chips , 2004, Proceedings. DELTA 2004. Second IEEE International Workshop on Electronic Design, Test and Applications.

[74]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[75]  Pei-Yun Tsai,et al.  Low-power variable-length fast Fourier transform processor , 2005 .

[76]  Krishna V. Palem,et al.  Ultra-Efficient (Embedded) SOC Architectures based on Probabilistic CMOS (PCMOS) Technology , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[77]  Melvin A. Breuer,et al.  Intelligible test techniques to support error-tolerance , 2004, 13th Asian Test Symposium.

[78]  Luis Ceze,et al.  Neural Acceleration for General-Purpose Approximate Programs , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[79]  Naresh R. Shanbhag,et al.  An energy-efficient circuit technique for single event transient noise-tolerance , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[80]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[81]  John Sartori,et al.  Slack redistribution for graceful degradation under voltage overscaling , 2010, 2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC).

[82]  L. Deng,et al.  The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.

[83]  John Sartori,et al.  Statistical analysis and modeling for error composition in approximate computation circuits , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[84]  Kaushik Roy,et al.  Analysis and characterization of inherent application resilience for approximate computing , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[85]  Melvin A. Breuer,et al.  An Illustrated Methodology for Analysis of Error Tolerance , 2008, IEEE Design & Test of Computers.

[86]  Melvin A. Breuer,et al.  An error-oriented test methodology to improve yield with error-tolerance , 2006, 24th IEEE VLSI Test Symposium.

[87]  Luca Benini,et al.  Telescopic units: increasing the average throughput of pipelined designs by adaptive latency control , 1997, DAC.

[88]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[89]  Gerald E. Larson,et al.  Evaluation of a “mental effort” hypothesis for correlations between cortical metabolism and intelligence , 1995 .

[90]  Naresh R. Shanbhag,et al.  Energy-efficiency in presence of deep submicron noise , 1998, 1998 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (IEEE Cat. No.98CB36287).

[91]  Kaushik Roy,et al.  MACACO: Modeling and analysis of circuits for approximate computing , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[92]  Kaushik Roy,et al.  Dynamic effort scaling: Managing the quality-efficiency tradeoff , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[93]  Krishna V. Palem,et al.  Probabilistic system-on-a-chip architectures , 2007, TODE.

[94]  Naresh R. Shanbhag,et al.  Toward achieving energy efficiency in presence of deep submicron noise , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[95]  Melvin A. Breuer,et al.  Error-tolerance and multi-media , 2006, 2006 International Conference on Intelligent Information Hiding and Multimedia.

[96]  Mauro Olivieri,et al.  Analysis and Implementation of a Novel Leading Zero Anticipation Algorithm for Floating-Point Arithmetic Units , 2007, IEEE Transactions on Circuits and Systems II: Express Briefs.

[97]  Melvin A. Breuer,et al.  An Error Rate Based Test Methodology to Support Error-Tolerance , 2008, IEEE Transactions on Reliability.

[98]  Naresh R. Shanbhag Reliable and energy-efficient digital signal processing , 2002, DAC '02.

[99]  Naresh R. Shanbhag,et al.  Reliable low-power design in the presence of deep submicron noise (embedded tutorial session) , 2000, ISLPED '00.

[100]  Srihari Cadambi,et al.  A dynamically configurable coprocessor for convolutional neural networks , 2010, ISCA.

[101]  Martin C. Rinard,et al.  Proving acceptability properties of relaxed nondeterministic approximate programs , 2012, PLDI.

[102]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[103]  Krishna V. Palem,et al.  Energy aware algorithm design via probabilistic computing: from algorithms and models to Moore's law and novel (semiconductor) devices , 2003, CASES '03.