论文信息 - The improbable but highly appropriate marriage of 3D stacking and neuromorphic accelerators

The improbable but highly appropriate marriage of 3D stacking and neuromorphic accelerators

3D stacking is a promising technology (low latency/power/area, high bandwidth); its main shortcoming is increased power density. Simultaneously, motivated by energy constraints, architectures are evolving towards greater customization, with tasks delegated to accelerators. Due to the widespread use of machine-learning algorithms and the re-emergence of neural networks (NNs) as the preferred such algorithms, NN accelerators are receiving increased at-tention. They turn out to be well matched to 3D stacking: inherently 3D structures with a low power density and high across-layer bandwidth requirements. We present what is, to the best of our knowledge, the first 3D stacked NN accelerator.

[1] Karthikeyan Sankaralingam,et al. Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[2] Timothée Masquelier,et al. Unsupervised Learning of Visual Features through Spike Timing Dependent Plasticity , 2007, PLoS Comput. Biol..

[3] Luis Ceze,et al. Neural Acceleration for General-Purpose Approximate Programs , 2014, IEEE Micro.

[4] Yoshua Bengio,et al. An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[5] Dharmendra S. Modha,et al. A digital neurosynaptic core using embedded crossbar memory with 45pJ per spike in 45nm , 2011, 2011 IEEE Custom Integrated Circuits Conference (CICC).

[6] P. K. Dubey,et al. Recognition, Mining and Synthesis Moves Comp uters to the Era of Tera , 2005 .

[7] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[8] Jason Cong,et al. Three-Dimensional Integrated Circuit Design: EDA, Design and Microarchitectures , 2009 .

[9] Marc'Aurelio Ranzato,et al. Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10] Ki-Tae Park,et al. Neuromorphic vision chip fabricated using three-dimensional integration technology , 2001, 2001 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC (Cat. No.01CH37177).

[11] Thomas Serre,et al. Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Jürgen Schmidhuber,et al. Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13] Olivier Temam,et al. A defect-tolerant accelerator for emerging high-performance applications , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[14] Dirk Herrmann,et al. Three Dimensional Integrated Circuit Design , 2016 .

[15] Jacques-Olivier Klein,et al. Design study of efficient digital order-based STDP neuron implementations for extracting temporal features , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[16] Zheng Li,et al. Continuous real-world inputs can open up alternative accelerator designs , 2013, ISCA.

[17] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[18] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[19] Berin Martini,et al. NeuFlow: A runtime reconfigurable dataflow processor for vision , 2011, CVPR 2011 WORKSHOPS.

[20] Steven Swanson,et al. QSCORES: Trading dark silicon for scalable energy efficiency with quasi-specific cores , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[21] Tobi Delbrück,et al. A 128 X 128 120db 30mw asynchronous vision sensor that responds to relative intensity change , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.

[22] M. L. Campbell,et al. 3D wafer stack neurocomputing , 1993, 1993 Proceedings Fifth Annual IEEE International Conference on Wafer Scale Integration.

[23] T. Martin McGinnity,et al. Comparative Investigation into Classical and Spiking Neuron Implementations on FPGAs , 2005, ICANN.