SpiNNaker: Fault tolerance in a power- and area- constrained large-scale neuromimetic architecture

SpiNNaker is a biologically-inspired massively-parallel computer designed to model up to a billion spiking neurons in real-time. A full-fledged implementation of a SpiNNaker system will comprise more than 105 integrated circuits (half of which are SDRAMs and half multi-core systems-on-chip). Given this scale, it is unavoidable that some components fail and, in consequence, fault-tolerance is a foundation of the system design. Although the target application can tolerate a certain, low level of failures, important efforts have been devoted to incorporate different techniques for fault tolerance. This paper is devoted to discussing how hardware and software mechanisms collaborate to make SpiNNaker operate properly even in the very likely scenario of component failures and how it can tolerate system-degradation levels well above those expected.

[1]  Christopher J. Bishop,et al.  Pulsed Neural Networks , 1998 .

[2]  Anthony G. Pipe,et al.  A Real-Time, FPGA Based, Biologically Plausible Neural Network Processor , 2005, ICANN.

[3]  K. Hiraki,et al.  Heterogeneous Functional Units for High Speed Fault-Tolerant Execution Stage , 2007 .

[4]  Cameron Patterson,et al.  Event-driven configuration of a neural network CMP system over an homogeneous interconnect fabric , 2011, Parallel Comput..

[5]  S. Herculano‐Houzel The Human Brain in Numbers: A Linearly Scaled-up Primate Brain , 2009, Front. Hum. Neurosci..

[6]  Antonio Robles,et al.  A routing methodology for achieving fault tolerance in direct networks , 2006, IEEE Transactions on Computers.

[7]  Daniel Marcos Chapiro,et al.  Globally-asynchronous locally-synchronous systems , 1985 .

[8]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[9]  Galen M. Shipman,et al.  Jaguar: The World?s Most Powerful Computer , 2009 .

[10]  George L.-T. Chiu,et al.  Overview of the Blue Gene/L system architecture , 2005, IBM J. Res. Dev..

[11]  Syed Masud Mahmud,et al.  Fault-Tolerant Hierarchical Networks for Shared Memory Multiprocessors and their Bandwidth Analysis , 2002 .

[12]  Valentin Puente,et al.  Immucube: Scalable Fault-Tolerant Routing for k-ary n-cube Networks , 2007, IEEE Transactions on Parallel and Distributed Systems.

[13]  Stephen B. Furber,et al.  Chain: A Delay-Insensitive Chip Area Interconnect , 2002, IEEE Micro.

[14]  Reflection Coefficients,et al.  A. real-time , 1982 .

[15]  Steve B. Furber,et al.  Managing Burstiness and Scalability in Event-Driven Models on the SpiNNaker Neuromimetic System , 2011, International Journal of Parallel Programming.

[16]  Misha Anne Mahowald,et al.  VLSI analogs of neuronal visual processing: a synthesis of form and function , 1992 .

[17]  J. Orbach Principles of Neurodynamics. Perceptrons and the Theory of Brain Mechanisms. , 1962 .

[18]  T J Sejnowski,et al.  Precision of pulse-coupled networks of integrate-and-fire neurons , 2001, Network.

[19]  Johannes Schemmel,et al.  Wafer-scale integration of analog neural networks , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[20]  Luis A. Plana,et al.  SpiNNaker: Mapping neural networks onto a massively-parallel chip multiprocessor , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[21]  Krste Asanovic,et al.  Parallel neural network training on Multi-Spert , 1997, Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing.

[22]  Jim D. Garside,et al.  Fault Tolerant Delay Insensitive Inter-chip Communication , 2009, 2009 15th IEEE Symposium on Asynchronous Circuits and Systems.

[23]  Terry Elliott,et al.  Developmental robotics: manifesto and application , 2003, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[24]  Dirk Hoenicke,et al.  Blue Gene/L compute chip: Control, test, and bring-up infrastructure , 2005, IBM J. Res. Dev..

[25]  Tom Verhoeff,et al.  Delay-insensitive codes — an overview , 1988, Distributed Computing.

[26]  Chung-Ho Chen,et al.  Fault Containment in Cache Memories for TMR Redundant Processor Systems , 1999, IEEE Trans. Computers.

[27]  Dharmendra S. Modha,et al.  The cat is out of the bag: cortical simulations with 109 neurons, 1013 synapses , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[28]  Javier Navaridas,et al.  Simulating and evaluating interconnection networks with INSEE , 2011, Simul. Model. Pract. Theory.

[29]  Michael Nicolaidis,et al.  Soft Errors in Modern Electronic Systems , 2010 .

[30]  Steve B. Furber,et al.  Understanding the interconnection network of SpiNNaker , 2009, ICS.

[31]  J. Yamada,et al.  A submicron 1 Mbit dynamic RAM with a 4-bit-at-a-time built-in ECC circuit , 1984, IEEE Journal of Solid-State Circuits.

[32]  Roman Obermaisser,et al.  A Transient-Resilient System-on-a-Chip Architecture with Support for On-Chip and Off-Chip TMR , 2008, 2008 Seventh European Dependable Computing Conference.

[33]  Cristian Constantinescu,et al.  Trends and Challenges in VLSI Circuit Reliability , 2003, IEEE Micro.

[34]  Peter Dayan,et al.  Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems , 2001 .

[35]  Vivek K. Pallipuram,et al.  Acceleration of spiking neural networks in emerging multi-core and GPU architectures , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[36]  Tarek M. Taha,et al.  Neuromorphic models on a GPGPU cluster , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[37]  Johannes Schemmel,et al.  Realizing biological spiking network models in a configurable wafer-scale hardware system , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[38]  P. H. Worley Comparison of Cray XT3 and XT4 Scalability , 2008 .

[39]  Stefan Fenz,et al.  Information Security Fortification by Ontological Mapping of the ISO/IEC 27001 Standard , 2007 .

[40]  M. Geike,et al.  Emulation engine for spiking neurons and adaptive synaptic weights , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[41]  H. Markram The Blue Brain Project , 2006, Nature Reviews Neuroscience.

[42]  S. Goldsack,et al.  IN REAL-TIME , 2008 .

[43]  Jerome A. Feldman,et al.  A supercomputer for neural computation , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[44]  Miodrag Potkonjak,et al.  A heterogeneous built-in self-repair approach using system-level synthesis flexibility , 2004, IEEE Transactions on Reliability.

[45]  Steve B. Furber,et al.  Interfacing Real-Time Spiking I/O with the SpiNNaker Neuromimetic Architecture , 2010, Aust. J. Intell. Inf. Process. Syst..

[46]  Idan Segev,et al.  Methods in Neuronal Modeling , 1988 .

[47]  Eugene M. Izhikevich,et al.  Simple model of spiking neurons , 2003, IEEE Trans. Neural Networks.

[48]  Steve B. Furber,et al.  Event-Driven Configuration of a Neural Network CMP System over a Homogeneous Interconnect Fabric , 2009, 2009 Eighth International Symposium on Parallel and Distributed Computing.

[49]  Massimo A. Sivilotti,et al.  Wiring considerations in analog VLSI systems, with application to field-programmable networks , 1992 .