EMEURO: A framework for generating multi-purpose accelerators via deep learning

Approximate computing is a very promising design paradigm for crossing the CPU power wall, primarily driven by the potential to sacrifice output quality for significant gains in performance, energy, and fault tolerance. Unfortunately, existing solutions have primarily either focused on new programming models, or new hardware designs, leaving significant room between these two ends for software-based optimizations. To fill this void, additional efforts should target the compilation and runtime stages, which have a critical impact on controlling the interactions of the many approximate subcomputations to form a well-optimized application. This paper presents EMEURO, a neural-network (NN) based emulation and acceleration platform. By restructuring algorithms to have the same data flow as a NN, EMEURO is able to achieve significant speedup across several domains with minimal design effort. EMEURO uses novel NN-based approximate computing techniques, including methods for efficiently searching the high-dimension subroutine space, and fine-grain control of error during runtime. EMEURO is able to achieve 7×-109× maximum speedup over the original algorithm with 0.1%-10% approximation error.

[1]  Henry Hoffmann,et al.  Dynamic knobs for responsive power-aware computing , 2011, ASPLOS XVI.

[2]  Martin Rinard,et al.  Using Code Perforation to Improve Performance, Reduce Energy Consumption, and Respond to Failures , 2009 .

[3]  Kunle Olukotun,et al.  The Future of Microprocessors , 2005, ACM Queue.

[4]  Martin C. Rinard Using early phase termination to eliminate load imbalances at barrier synchronization points , 2007, OOPSLA.

[5]  Johannes Schemmel,et al.  Wafer-scale integration of analog neural networks , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[6]  Jacob Nelson,et al.  SNNAP: Approximate computing on programmable SoCs via neural acceleration , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[7]  Christoforos E. Kozyrakis,et al.  Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.

[8]  Scott A. Mahlke,et al.  Paraprox: pattern-based approximation for data parallel applications , 2014, ASPLOS.

[9]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[10]  Jacob Nelson,et al.  Approximate storage in solid-state memories , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11]  Luis Ceze,et al.  Neural Acceleration for General-Purpose Approximate Programs , 2014, IEEE Micro.

[12]  Luis Ceze,et al.  Architecture support for disciplined approximate programming , 2012, ASPLOS XVII.

[13]  Scott A. Mahlke,et al.  SAGE: Self-tuning approximation for graphics engines , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[14]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[15]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[16]  Kunle Olukotun,et al.  A highly scalable Restricted Boltzmann Machine FPGA implementation , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[17]  Krishna V. Palem,et al.  Ultra-Efficient (Embedded) SOC Architectures based on Probabilistic CMOS (PCMOS) Technology , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[18]  Olivier Temam,et al.  A defect-tolerant accelerator for emerging high-performance applications , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[19]  Dhananjay S. Phatak,et al.  Investigating the Fault Tolerance of Neural Networks , 2005, Neural Computation.

[20]  Woongki Baek,et al.  Green: a framework for supporting energy-conscious programming using controlled approximation , 2010, PLDI '10.

[21]  Alan Edelman,et al.  PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.

[22]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[23]  Alan Edelman,et al.  Language and compiler support for auto-tuning variable-accuracy algorithms , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[24]  Martin C. Rinard Probabilistic accuracy bounds for fault-tolerant computations that discard tasks , 2006, ICS '06.

[25]  Jie Han,et al.  Approximate computing: An emerging paradigm for energy-efficient design , 2013, 2013 18th IEEE European Test Symposium (ETS).

[26]  Dan Grossman,et al.  EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.

[27]  I. G. Persiantsev,et al.  Multifold Acceleration of Neural Network Computations Using GPU , 2009, ICANN.

[28]  Mikko H. Lipasti,et al.  BenchNN: On the broad potential application scope of hardware neural network accelerators , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).

[29]  Henry Hoffmann,et al.  Managing performance vs. accuracy trade-offs with loop perforation , 2011, ESEC/FSE '11.