TrueNorth: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip

The new era of cognitive computing brings forth the grand challenge of developing systems capable of processing massive amounts of noisy multisensory data. This type of intelligent computing poses a set of constraints, including real-time operation, low-power consumption and scalability, which require a radical departure from conventional system design. Brain-inspired architectures offer tremendous promise in this area. To this end, we developed TrueNorth, a 65 mW real-time neurosynaptic processor that implements a non-von Neumann, low-power, highly-parallel, scalable, and defect-tolerant architecture. With 4096 neurosynaptic cores, the TrueNorth chip contains 1 million digital neurons and 256 million synapses tightly interconnected by an event-driven routing infrastructure. The fully digital 5.4 billion transistor implementation leverages existing CMOS scaling trends, while ensuring one-to-one correspondence between hardware and software. With such aggressive design metrics and the TrueNorth architecture breaking path with prevailing architectures, it is clear that conventional computer-aided design (CAD) tools could not be used for the design. As a result, we developed a novel design methodology that includes mixed asynchronous-synchronous circuits and a complete tool flow for building an event-driven, low-power neurosynaptic chip. The TrueNorth chip is fully configurable in terms of connectivity and neural parameters to allow custom configurations for a wide range of cognitive and sensory perception applications. To reduce the system's communication energy, we have adapted existing application-agnostic very large-scale integration CAD placement tools for mapping logical neural networks to the physical neurosynaptic core locations on the TrueNorth chips. With that, we have successfully demonstrated the use of TrueNorth-based systems in multiple applications, including visual object recognition, with higher performance and orders of magnitude lower power consumption than the same algorithms run on von Neumann architectures. The TrueNorth chip and its tool flow serve as building blocks for future cognitive systems, and give designers an opportunity to develop novel brain-inspired architectures and systems based on the knowledge obtained from this paper.

[1]  Steve B. Furber,et al.  Power analysis of large-scale, real-time neural networks on SpiNNaker , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[2]  Alain J. Martin,et al.  High-level synthesis of asynchronous systems by data-driven decomposition , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[3]  Andrew S. Cassidy,et al.  Visual saliency on networks of neurosynaptic cores , 2015, IBM J. Res. Dev..

[4]  Dharmendra S. Modha,et al.  Anatomy of a cortical simulator , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[5]  Kenneth L. Shepard,et al.  Noise in deep submicron digital design , 1996, Proceedings of International Conference on Computer Aided Design.

[6]  Andrew Y. Ng,et al.  The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , 2011, ICML.

[7]  Jens Vygen,et al.  Algorithms for large-scale flat placement , 1997, DAC.

[8]  Dharmendra S. Modha,et al.  The cat is out of the bag: cortical simulations with 109 neurons, 1013 synapses , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[9]  Andrew B. Kahng,et al.  A fast hierarchical quadratic placement algorithm , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[10]  Jim D. Garside,et al.  SpiNNaker: A 1-W 18-Core System-on-Chip for Massively-Parallel Neural Network Simulation , 2013, IEEE Journal of Solid-State Circuits.

[11]  Rajit Manohar,et al.  A Three-Tier Asynchronous FPGA , 2006 .

[12]  Jeremy Hsu,et al.  IBM's new brain [News] , 2014 .

[13]  Eugene M. Izhikevich,et al.  Which model to use for cortical spiking neurons? , 2004, IEEE Transactions on Neural Networks.

[14]  Rodrigo Alvarez-Icaza,et al.  A Multicast Tree Router for Multichip Neuromorphic Systems , 2014, IEEE Transactions on Circuits and Systems I: Regular Papers.

[15]  Rodrigo Alvarez-Icaza,et al.  Neurogrid: A Mixed-Analog-Digital Multichip System for Large-Scale Neural Simulations , 2014, Proceedings of the IEEE.

[16]  John K. Tsotsos,et al.  50 Years of object recognition: Directions forward , 2013, Comput. Vis. Image Underst..

[17]  Alain J. Martin The limitations to delay-insensitivity in asynchronous circuits , 1990 .

[18]  Andrew M Lines,et al.  Pipelined Asynchronous Circuits , 1998 .

[19]  Alain J. Martin Compiling communicating processes into delay-insensitive VLSI circuits , 2005, Distributed Computing.

[20]  Jongkil Park,et al.  A 65k-neuron 73-Mevents/s 22-pJ/event asynchronous micro-pipelined integrate-and-fire array transceiver , 2014, 2014 IEEE Biomedical Circuits and Systems Conference (BioCAS) Proceedings.

[21]  Ieee Circuits,et al.  IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems information for authors , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[22]  Giacomo Indiveri,et al.  An Event-Based Neural Network Architecture With an Asynchronous Programmable Synaptic Memory , 2014, IEEE Transactions on Biomedical Circuits and Systems.

[23]  Alain J. Martin,et al.  Asynchronous Pulse Logic , 2002 .

[24]  Chris C. N. Chu,et al.  RQL: Global Placement via Relaxed Quadratic Spreading and Linearization , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[25]  Andrew S. Cassidy,et al.  Cognitive computing systems: Algorithms and applications for networks of neurosynaptic cores , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[26]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[27]  Andrew S. Cassidy,et al.  Cognitive computing programming paradigm: A Corelet Language for composing networks of neurosynaptic cores , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[28]  Andrew B. Kahng,et al.  Can recursive bisection alone produce routable, placements? , 2000, Proceedings 37th Design Automation Conference.

[29]  Andrew S. Cassidy,et al.  A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.

[30]  Andrew S. Cassidy,et al.  Building block of a programmable neuromorphic substrate: A digital neurosynaptic core , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[31]  Jason Cong,et al.  Modern Circuit Placement, Best Practices and Results , 2007 .

[32]  Gi-Joon Nam,et al.  Effective free space management for cut-based placement via analytical constraint generation , 2003, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[33]  Myron Flickner,et al.  Compass: A scalable simulator for an architecture for cognitive computing , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[34]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[35]  Andrew S. Cassidy,et al.  Real-Time Scalable Cortical Computing at 46 Giga-Synaptic OPS/Watt with ~100× Speedup in Time-to-Solution and ~100,000× Reduction in Energy-to-Solution , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[36]  Ravi Iyengar,et al.  28nm high- metal-gate heterogeneous quad-core CPUs for high-performance and energy-efficient mobile application processor , 2013, 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers.

[37]  D. Modha,et al.  Network architecture of the long-distance pathways in the macaque brain , 2010, Proceedings of the National Academy of Sciences.

[38]  Andrew S. Cassidy,et al.  Design of silicon brains in the nano-CMOS era: Spiking neurons, learning synapses and neural architecture optimization , 2013, Neural Networks.

[39]  Gi-Joon Nam,et al.  The ISPD2005 placement contest and benchmark suite , 2005, ISPD '05.

[40]  Alain J. Martin,et al.  Syntax-directed translation of concurrent programs into self-timed circuits , 1988 .

[41]  Ieee Standards Board IEEE standard verilog hardware description language , 2001 .

[42]  Deepak Khosla,et al.  Performance Evaluation of Neuromorphic-Vision Object Recognition Algorithms , 2014, 2014 22nd International Conference on Pattern Recognition.

[43]  Rajit Manohar,et al.  An analysis of reshuffled handshaking expansions , 2001, Proceedings Seventh International Symposium on Asynchronous Circuits and Systems. ASYNC 2001.

[44]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Andrew S. Cassidy,et al.  Cognitive computing building block: A versatile and efficient digital neuron model for neurosynaptic cores , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[46]  Youngmin Shin,et al.  28nm high-K metal gate heterogeneous quad-core CPUs for high-performance and energy-efficient mobile application processor , 2013, 2013 International SoC Design Conference (ISOCC).

[47]  Johannes Schemmel,et al.  Live demonstration: A scaled-down version of the BrainScaleS wafer-scale neuromorphic system , 2012, 2012 IEEE International Symposium on Circuits and Systems.