Leveraging the Error Resilience of Neural Networks for Designing Highly Energy Efficient Accelerators

In recent years, inexact computing has been increasingly regarded as one of the most promising approaches for slashing energy consumption in many applications that can tolerate a certain degree of inaccuracy. Driven by the principle of trading tolerable amounts of application accuracy in return for significant resource savings-the energy consumed, the (critical path) delay, and the (silicon) area-this approach has been limited to application-specified integrated circuits (ASICs) so far. These ASIC realizations have a narrow application scope and are often rigid in their tolerance to inaccuracy, as currently designed; the latter often determining the extent of resource savings we would achieve. In this paper, we propose to improve the application scope, error resilience and the energy savings of inexact computing by combining it with hardware neural networks. These neural networks are fast emerging as popular candidate accelerators for future heterogeneous multicore platforms and have flexible error resilience limits owing to their ability to be trained. Our results in 65-nm technology demonstrate that the proposed inexact neural network accelerator could achieve 1.78-2.67× savings in energy consumption (with corresponding delay and area savings being 1.23 and 1.46×, respectively) when compared to the existing baseline neural network implementation, at the cost of a small accuracy loss (mean squared error increases from 0.14 to 0.20 on average).

[1]  Kaushik Roy,et al.  AxNN: Energy-efficient neuromorphic systems using approximate computing , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[2]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3]  Lingamneni Avinash,et al.  What to do about the end of Moore's law, probably! , 2012, DAC Design Automation Conference 2012.

[4]  Steven Swanson,et al.  Conservation cores: reducing the energy of mature computations , 2010, ASPLOS XV.

[5]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[6]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[7]  Krishna V. Palem,et al.  Optimizing energy to minimize errors in dataflow graphs using approximate adders , 2010, CASES '10.

[8]  P. K. Dubey,et al.  Recognition, Mining and Synthesis Moves Comp uters to the Era of Tera , 2005 .

[9]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[11]  Luis Ceze,et al.  Architecture support for disciplined approximate programming , 2012, ASPLOS XVII.

[12]  Noel E. O'Connor,et al.  Towards Hardware Acceleration of Neuroevolution for Multimedia Processing Applications on Mobile Devices , 2006, ICONIP.

[13]  Krishna V. Palem,et al.  Probabilistic arithmetic and energy efficient embedded signal processing , 2006, CASES '06.

[14]  Richard M. Karp,et al.  Algorithmic methodologies for ultra-efficient inexact architectures for sustaining technology scaling , 2012, CF '12.

[15]  Yiran Chen,et al.  Quality-retaining OLED dynamic voltage scaling for video streaming applications on mobile devices , 2012, DAC Design Automation Conference 2012.

[16]  Wei Zhang,et al.  Fine-grained dynamic voltage scaling on OLED display , 2012, 17th Asia and South Pacific Design Automation Conference.

[17]  Olivier Temam,et al.  A defect-tolerant accelerator for emerging high-performance applications , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[18]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[19]  Sorin Draghici,et al.  On the capabilities of neural networks using limited precision weights , 2002, Neural Networks.

[20]  Dharmendra S. Modha,et al.  A digital neurosynaptic core using embedded crossbar memory with 45pJ per spike in 45nm , 2011, 2011 IEEE Custom Integrated Circuits Conference (CICC).

[21]  E. J. King,et al.  Data-dependent truncation scheme for parallel multipliers , 1997, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136).

[22]  Lingamneni Avinash,et al.  Parsimonious Circuits for Error-Tolerant Applications through Probabilistic Logic Minimization , 2011, PATMOS.

[23]  Krishna V. Palem,et al.  Ultra-Efficient (Embedded) SOC Architectures based on Probabilistic CMOS (PCMOS) Technology , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[24]  Wolfgang Maass,et al.  STDP enables spiking neurons to detect hidden causes of their inputs , 2009, NIPS.

[25]  Luis Ceze,et al.  Neural Acceleration for General-Purpose Approximate Programs , 2014, IEEE Micro.

[26]  Caro Lucas,et al.  Bio-Inspired Imprecise Computational Blocks for Efficient VLSI Implementation of Soft-Computing Applications , 2010, IEEE Transactions on Circuits and Systems I: Regular Papers.

[27]  Jenq-Neng Hwang,et al.  Finite Precision Error Analysis of Neural Network Hardware Implementations , 1993, IEEE Trans. Computers.

[28]  Mikko H. Lipasti,et al.  BenchNN: On the broad potential application scope of hardware neural network accelerators , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).

[29]  J. Nazuno Haykin, Simon. Neural networks: A comprehensive foundation, Prentice Hall, Inc. Segunda Edición, 1999 , 2000 .

[30]  Puneet Gupta,et al.  Trading Accuracy for Power with an Underdesigned Multiplier Architecture , 2011, 2011 24th Internatioal Conference on VLSI Design.

[31]  Krishna V. Palem,et al.  Energy aware algorithm design via probabilistic computing: from algorithms and models to Moore's law and novel (semiconductor) devices , 2003, CASES '03.

[32]  Mikko H. Lipasti,et al.  Automatic abstraction and fault tolerance in cortical microachitectures , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[33]  David Novo,et al.  Selective flexibility: Breaking the rigidity of datapath merging , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[34]  Sandeep K. Gupta,et al.  Approximate logic synthesis for error tolerant applications , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[35]  Lingamneni Avinash,et al.  Energy parsimonious circuit design through probabilistic pruning , 2011, 2011 Design, Automation & Test in Europe.