Toward General-Purpose Code Acceleration with Analog Computation

We propose a solution—from circuit to compiler—that enables general-purpose use of limited-precision, analog hardware to accelerate “approximable” code—code that can tolerate imprecise execution. We utilize an algorithmic transformation that automatically converts approximable regions of code from a von Neumann model to an “analog” neural model. We outline the challenges of taking an analog approach, including restricted-range value encoding, limited precision in computation, circuit inaccuracies, noise, and constraints on supported topologies. We address these limitations with a combination of circuit techniques, a novel hardware/software interface, neural-network training techniques, and compiler support. Analog neural acceleration provides whole application speedup of 3.3× and and energy savings of 12.1× with quality loss less than 10% for all except one benchmark. These results show that using limited-precision analog circuits for code acceleration, through a neural approach, is both feasible and beneficial over a range emerging applications.

[1]  Luis Ceze,et al.  Neural Acceleration for General-Purpose Approximate Programs , 2014, IEEE Micro.

[2]  Scott A. Mahlke,et al.  SAGE: Self-tuning approximation for graphics engines , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[3]  Zheng Li,et al.  Continuous real-world inputs can open up alternative accelerator designs , 2013, ISCA.

[4]  Mikko H. Lipasti,et al.  BenchNN: On the broad potential application scope of hardware neural network accelerators , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).

[5]  Olivier Temam,et al.  Hardware spiking neurons design: Analog or digital? , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[6]  Olivier Temam,et al.  A defect-tolerant accelerator for emerging high-performance applications , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[7]  Luis Ceze,et al.  Architecture support for disciplined approximate programming , 2012, ASPLOS XVII.

[8]  Mark Horowitz,et al.  Energy-Efficient Floating-Point Unit Design , 2011, IEEE Transactions on Computers.

[9]  Mikko H. Lipasti,et al.  Automatic abstraction and fault tolerance in cortical microachitectures , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[10]  Dan Grossman,et al.  EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.

[11]  Mikko H. Lipasti,et al.  A case for neuromorphic ISAs , 2011, ASPLOS XVI.

[12]  Karthikeyan Sankaralingam,et al.  Relax: an architectural framework for software recovery of hardware faults , 2010, ISCA.

[13]  Woongki Baek,et al.  Green: a framework for supporting energy-conscious programming using controlled approximation , 2010, PLDI '10.

[14]  Henry Hoffmann,et al.  Quality of service profiling , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[15]  Douglas L. Jones,et al.  Scalable stochastic processors , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[16]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[17]  Johannes Schemmel,et al.  Wafer-scale integration of analog neural networks , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[18]  Norman P. Jouppi,et al.  Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[19]  Babak Nadjar Araabi,et al.  Neural network stream processing core (NnSP) for embedded systems , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[20]  Krishna V. Palem,et al.  Ultra-Efficient (Embedded) SOC Architectures based on Probabilistic CMOS (PCMOS) Technology , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[21]  Jihan Zhu,et al.  FPGA Implementations of Neural Networks - A Survey of a Decade of Progress , 2003, FPL.

[22]  Christian Igel,et al.  Improving the Rprop Learning Algorithm , 2000 .

[23]  Lawrence D. Jackel,et al.  An analog neural network processor with programmable topology , 1991 .

[24]  B. Gupta,et al.  Learning on an analog VLSI neural network chip , 1990, 1990 IEEE International Conference on Systems, Man, and Cybernetics Conference Proceedings.

[25]  H. John Caulfield,et al.  Weight discretization paradigm for optical neural networks , 1990, Other Conferences.

[26]  Phillip E Allen,et al.  CMOS Analog Circuit Design , 1987 .

[27]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[28]  Alan B. Grebene,et al.  Analog Integrated Circuit Design , 1978 .