Socrates-D 2.0: A Low Power High Throughput Architecture for Deep Network Training

Specialized ultra-low power deep learning architectures with on-chip training capability can be useful in variety of applications that require adaptability. This paper presents such a processor design, Socrates-D 2.0, a multicore architecture for deep neural network based training and inference. The architecture consists of a set of processing cores, each with internal memories to store synaptic weights. Additionally, we present a method to map traditional deep learning networks to our multicore architecture and show that there is minimal impact in training accuracy. The system level area and power benefits of the specialized architecture are compared with the earlier generation of Socrates-D. Our experimental evaluations show that the proposed architecture can provide 1.25× area and 1.19× energy efficiency than the previous version of Socrates-D.

[1]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[2]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[3]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[4]  Tarek M. Taha,et al.  Socrates-D: Multicore Architecture for On-Line Learning , 2017, 2017 IEEE International Conference on Rebooting Computing (ICRC).

[5]  Johannes Gorner,et al.  An energy efficient multi-bit TSV transmitter using capacitive coupling , 2014, 2014 21st IEEE International Conference on Electronics, Circuits and Systems (ICECS).

[6]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[7]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Yuan Gao,et al.  RedEye: Analog ConvNet Image Sensor Architecture for Continuous Mobile Vision , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[9]  Norman P. Jouppi,et al.  Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[10]  J. M. Tarela,et al.  Approximation of sigmoid function and the derivative for hardware implementation of artificial neurons , 2004 .

[11]  Andrew S. Cassidy,et al.  A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.

[12]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[13]  Wenguang Chen,et al.  NEUTRAMS: Neural network transformation and co-design under neuromorphic hardware constraints , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[14]  Andrew B. Kahng,et al.  ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[15]  Tao Zhang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[16]  Norman P. Jouppi,et al.  CACTI 6.0: A Tool to Model Large Caches , 2009 .

[17]  Sudhakar Yalamanchili,et al.  Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).