A programmable hardware accelerator for simulating dynamical systems

The fast and energy-efficient simulation of dynamical systems defined by coupled ordinary/partial differential equations has emerged as an important problem. The accelerated simulation of coupled ODE/PDE is critical for analysis of physical systems as well as computing with dynamical systems. This paper presents a fast and programmable accelerator for simulating dynamical systems. The computing model of the proposed platform is based on multilayer cellular nonlinear network (CeNN) augmented with nonlinear function evaluation engines. The platform can be programmed to accelerate wide classes of ODEs/PDEs by modulating the connectivity within the multilayer CeNN engine. An innovative hardware architecture including data reuse, memory hierarchy, and near-memory processing is designed to accelerate the augmented multilayer CeNN. A dataflow model is presented which is supported by optimized memory hierarchy for efficient function evaluation. The proposed solver is designed and synthesized in 15nm technology for the hardware analysis. The performance is evaluated and compared to GPU nodes when solving wide classes of differential equations and the power consumption is analyzed to show orders of magnitude improvement in energy efficiency.

[1]  W. D. Little,et al.  On the analog computer solution of first-order partial differential equations , 1965 .

[2]  Rüdiger Westermann,et al.  Linear algebra operators for GPU implementation of numerical algorithms , 2003, SIGGRAPH Courses.

[3]  L. Chua,et al.  Simulating nonlinear waves and partial differential equations via CNN. I. Basic techniques , 1995 .

[4]  S. Strogatz,et al.  Dynamics of a large system of coupled nonlinear oscillators , 1991 .

[5]  Christine Chevallereau,et al.  Models, feedback control, and open problems of 3D bipedal robotic walking , 2014, Autom..

[6]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[7]  Stephen Booth,et al.  Maxwell - a 64 FPGA Supercomputer , 2007, Second NASA/ESA Conference on Adaptive Hardware and Systems (AHS 2007).

[8]  Luca Benini,et al.  Origami: A Convolutional Network Accelerator , 2015, ACM Great Lakes Symposium on VLSI.

[9]  Zhijiang Shao,et al.  A unified motion planning method for parking an autonomous vehicle in the presence of irregularly placed obstacles , 2015, Knowl. Based Syst..

[10]  Alireza Shafaei,et al.  FinCACTI: Architectural Analysis and Modeling of Caches with Deeply-Scaled FinFET Devices , 2014, 2014 IEEE Computer Society Annual Symposium on VLSI.

[11]  Ángel Rodríguez-Vázquez,et al.  ACE16k: the third generation of mixed-signal SIMD-CNN ACE chips toward VSoCs , 2004, IEEE Transactions on Circuits and Systems I: Regular Papers.

[12]  Suman Datta,et al.  Computing with dynamical systems in the post-CMOS era , 2016, 2016 IEEE Photonics Society Summer Topical Meeting Series (SUM).

[13]  J. Jeddeloh,et al.  Hybrid memory cube new DRAM architecture increases density and performance , 2012, 2012 Symposium on VLSI Technology (VLSIT).

[14]  Andrew S. Cassidy,et al.  A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.

[15]  F. Gollas,et al.  Modeling complex systems by reaction-diffusion cellular nonlinear networks with polynomial weight-functions , 2005, 2005 9th International Workshop on Cellular Neural Networks and Their Applications.

[16]  Tianshi Chen,et al.  ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[17]  Joel Emer,et al.  Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[18]  Angela Slavova,et al.  Complex behavior of polynomial FitzHugh–Nagumo cellular neural network model , 2007 .

[19]  Z. Nagy,et al.  Two-dimensional compressible flow simulation on emulated digital CNN-UM , 2008, 2008 11th International Workshop on Cellular Neural Networks and Their Applications.

[20]  William Robert Dawson Boyd,et al.  Massively parallel algorithms for method of characteristics neutral particle transport on shared memory computer architectures , 2014 .

[21]  Kyandoghere Kyamakya,et al.  A Universal Concept Based on Cellular Neural Networks for Ultrafast and Flexible Solving of Differential Equations , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[22]  S. Datta,et al.  Pairwise coupled hybrid vanadium dioxide-MOSFET (HVFET) oscillators for non-boolean associative computing , 2014, 2014 IEEE International Electron Devices Meeting.

[23]  Peter R. Kinget,et al.  A programmable analog cellular neural network CMOS chip for high speed image processing , 1995, IEEE J. Solid State Circuits.

[24]  Viktor K. Prasanna,et al.  High Performance Linear Algebra Operations on Reconfigurable Systems , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[25]  Leon O. Chua,et al.  Autonomous cellular neural networks: a unified paradigm for pattern formation and active wave propagation , 1995 .

[26]  Péter Szolgay,et al.  Emulated digital CNN-UM solution of partial differential equations , 2006, Int. J. Circuit Theory Appl..

[27]  Péter Szolgay,et al.  Implementation of embedded emulated-digital CNN-UM global analogic programming unit on FPGA and its application , 2008, Int. J. Circuit Theory Appl..

[28]  Henk Corporaal,et al.  Memory-centric accelerator design for Convolutional Neural Networks , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[29]  Hoi-Jun Yoo,et al.  24-GOPS 4.5-${\rm mm}^{2}$ Digital Cellular Neural Network for Rapid Visual Attention in an Object-Recognition SoC , 2011, IEEE Transactions on Neural Networks.

[30]  Bernard Brogliato,et al.  Modeling, stability and control of biped robots - a general framework , 2004, Autom..

[31]  Jim D. Garside,et al.  SpiNNaker: A 1-W 18-Core System-on-Chip for Massively-Parallel Neural Network Simulation , 2013, IEEE Journal of Solid-State Circuits.

[32]  Ángel Rodríguez-Vázquez,et al.  The Eye-RIS CMOS Vision System , 2008 .

[33]  Tadashi Shibata,et al.  Coupled-Oscillator Associative Memory Array Operation for Pattern Recognition , 2015, IEEE Journal on Exploratory Solid-State Computational Devices and Circuits.

[34]  Mayler G. A. Martins,et al.  Open Cell Library in 15nm FreePDK Technology , 2015, ISPD.

[35]  LiBai,et al.  A unified motion planning method for parking an autonomous vehicle in the presence of irregularly placed obstacles , 2015 .

[36]  A. Hodgkin,et al.  The dual effect of membrane potential on sodium conductance in the giant axon of Loligo , 1952, The Journal of physiology.

[37]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[38]  Alexandre C. B. Delbem,et al.  Design of associative memories using cellular neural networks , 2009, Neurocomputing.

[39]  Péter Szolgay,et al.  Emulated digital CNN-UM solution of partial differential equations: Research Articles , 2006 .

[40]  Stefania Bandini,et al.  Computing with a Distributed Reaction-Diffusion Model , 2004, MCU.

[41]  Bertram E. Shi,et al.  Spatial pattern formation via reaction-diffusion dynamics in 32×32×4 CNN chip , 2004, IEEE Trans. Circuits Syst. I Regul. Pap..

[42]  Fred Rothganger,et al.  Computing with dynamical systems , 2016, 2016 IEEE International Conference on Rebooting Computing (ICRC).

[43]  Benjamin Schrauwen,et al.  Information Processing Capacity of Dynamical Systems , 2012, Scientific Reports.

[44]  Zidong Wang,et al.  Cellular Neural Networks, the Navier–Stokes Equation, and Microarray Image Reconstruction , 2011, IEEE Transactions on Image Processing.

[45]  Eugene M. Izhikevich,et al.  Simple model of spiking neurons , 2003, IEEE Trans. Neural Networks.