The improbable but highly appropriate marriage of 3D stacking and neuromorphic accelerators

3D stacking is a promising technology (low latency/power/area, high bandwidth); its main shortcoming is increased power density. Simultaneously, motivated by energy constraints, architectures are evolving towards greater customization, with tasks delegated to accelerators. Due to the widespread use of machine-learning algorithms and the re-emergence of neural networks (NNs) as the preferred such algorithms, NN accelerators are receiving increased at-tention. They turn out to be well matched to 3D stacking: inherently 3D structures with a low power density and high across-layer bandwidth requirements. We present what is, to the best of our knowledge, the first 3D stacked NN accelerator.

[1]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[2]  Timothée Masquelier,et al.  Unsupervised Learning of Visual Features through Spike Timing Dependent Plasticity , 2007, PLoS Comput. Biol..

[3]  Luis Ceze,et al.  Neural Acceleration for General-Purpose Approximate Programs , 2014, IEEE Micro.

[4]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[5]  Dharmendra S. Modha,et al.  A digital neurosynaptic core using embedded crossbar memory with 45pJ per spike in 45nm , 2011, 2011 IEEE Custom Integrated Circuits Conference (CICC).

[6]  P. K. Dubey,et al.  Recognition, Mining and Synthesis Moves Comp uters to the Era of Tera , 2005 .

[7]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[8]  Jason Cong,et al.  Three-Dimensional Integrated Circuit Design: EDA, Design and Microarchitectures , 2009 .

[9]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Ki-Tae Park,et al.  Neuromorphic vision chip fabricated using three-dimensional integration technology , 2001, 2001 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC (Cat. No.01CH37177).

[11]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Olivier Temam,et al.  A defect-tolerant accelerator for emerging high-performance applications , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[14]  Dirk Herrmann,et al.  Three Dimensional Integrated Circuit Design , 2016 .

[15]  Jacques-Olivier Klein,et al.  Design study of efficient digital order-based STDP neuron implementations for extracting temporal features , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[16]  Zheng Li,et al.  Continuous real-world inputs can open up alternative accelerator designs , 2013, ISCA.

[17]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[18]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[19]  Berin Martini,et al.  NeuFlow: A runtime reconfigurable dataflow processor for vision , 2011, CVPR 2011 WORKSHOPS.

[20]  Steven Swanson,et al.  QSCORES: Trading dark silicon for scalable energy efficiency with quasi-specific cores , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[21]  Tobi Delbrück,et al.  A 128 X 128 120db 30mw asynchronous vision sensor that responds to relative intensity change , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.

[22]  M. L. Campbell,et al.  3D wafer stack neurocomputing , 1993, 1993 Proceedings Fifth Annual IEEE International Conference on Wafer Scale Integration.

[23]  T. Martin McGinnity,et al.  Comparative Investigation into Classical and Spiking Neuron Implementations on FPGAs , 2005, ICANN.