Efficient Local Memory Support for Approximate Computing

Given the saturation of single-threaded performance improvements in General-Purpose Processors (GPPs), novel architectural techniques are required to meet emerging demands. In this paper, we propose a generic acceleration framework for approximate algorithms that replaces computation with table look-up accesses in dedicated memories. At compile time, annotated application kernels are automatically profiled using sample inputs, and the most representative input-output mappings of each kernel are selected by using K-Means Clustering and saved in the program binary. At runtime, these mappings are loaded into dedicated look-up tables, and kernel execution is replaced by hardware execution of the Nearest-Centroid Classifier, which selects from memory the best matching output to the region. We show a comparison with a similar framework based on neural acceleration and that, under similar levels of quality, the proposed approach achieves on average three times better performance and energy with significant area savings, thus opening new opportunities for performance harvesting in approximate accelerators.

[1]  Muhammad Shafique,et al.  Invited: Cross-layer approximate computing: From logic to architectures , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[2]  Sumit Gulwani,et al.  Proving programs robust , 2011, ESEC/FSE '11.

[3]  Kaushik Roy,et al.  Design of voltage-scalable meta-functions for approximate computing , 2011, 2011 Design, Automation & Test in Europe.

[4]  Luigi Carro,et al.  Adaptable Embedded Systems , 2012 .

[5]  Jacob Nelson,et al.  SNNAP: Approximate computing on programmable SoCs via neural acceleration , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[6]  Hadi Esmaeilzadeh,et al.  Neural acceleration for GPU throughput processors , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[7]  Hadi Esmaeilzadeh,et al.  AxBench: A Benchmark Suite for Approximate Computing Across the System Stack , 2016 .

[8]  Jack J. Dongarra,et al.  A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..

[9]  Antonio Carlos Schneider Beck,et al.  Accelerating error-tolerant applications with approximate function reuse , 2018, Sci. Comput. Program..

[10]  Sparsh Mittal,et al.  A Survey of Techniques for Approximate Computing , 2016, ACM Comput. Surv..

[11]  Dan Grossman,et al.  EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.

[12]  Qiang Xu,et al.  Approximate Computing: A Survey , 2016, IEEE Design & Test.

[13]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[14]  Luis Ceze,et al.  Neural Acceleration for General-Purpose Approximate Programs , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[15]  Luis Ceze,et al.  General-purpose code acceleration with limited-precision analog computation , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[16]  Henry Hoffmann,et al.  Quality of service profiling , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[17]  Mikko H. Lipasti,et al.  BenchNN: On the broad potential application scope of hardware neural network accelerators , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).

[18]  Henry Hoffmann,et al.  Managing performance vs. accuracy trade-offs with loop perforation , 2011, ESEC/FSE '11.

[19]  Naresh R. Shanbhag,et al.  Energy-efficient signal processing via algorithmic noise-tolerance , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[20]  Luigi Carro,et al.  Approximate On-The-Fly Coarse-Grained Reconfigurable Acceleration for General-Purpose Applications , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[21]  P J Webros BACKPROPAGATION THROUGH TIME: WHAT IT DOES AND HOW TO DO IT , 1990 .

[22]  Norman P. Jouppi,et al.  CACTI 6.0: A Tool to Model Large Caches , 2009 .

[23]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[24]  Glenn Reinman,et al.  BRAINIAC: Bringing reliable accuracy into neurally-implemented approximate computing , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[25]  Jie Han,et al.  Approximate computing: An emerging paradigm for energy-efficient design , 2013, 2013 18th IEEE European Test Symposium (ETS).