Efficient sparse matrix-delayed vector multiplication for discretized neural field model

Computational models of the human brain provide an important tool for studying the principles behind brain function and disease. To achieve whole-brain simulation, models are formulated at the level of neuronal populations as systems of delayed differential equations. In this paper, we show that the integration of large systems of sparsely connected neural masses is similar to well-studied sparse matrix-vector multiplication; however, due to delayed contributions, it differs in the data access pattern to the vectors. To improve data locality, we propose a combination of node reordering and tiled schedules derived from the connectivity matrix of the particular system, which allows performing multiple integration steps within a tile. We present two schedules: with a serial processing of the tiles and one allowing for parallel processing of the tiles. We evaluate the presented schedules showing speedup up to $$2\,\times $$2× on single-socket CPU, and $$1.25\,\times $$1.25× on Xeon Phi accelerator.

[1]  Viktor K. Jirsa,et al.  The Virtual Brain: a simulator of primate brain network dynamics , 2013, Front. Neuroinform..

[2]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[3]  Gerhard Wellein,et al.  LIKWID: Lightweight Performance Tools , 2011, CHPC.

[4]  David E. Keyes,et al.  Multidimensional Intratile Parallelization for Memory-Starved Stencil Computations , 2015, ACM Trans. Parallel Comput..

[5]  Edmond Chow,et al.  Improving the Performance of Dynamical Simulations Via Multiple Right-Hand Sides , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[6]  Viktor K. Jirsa,et al.  Mathematical framework for large-scale brain network modeling in The Virtual Brain , 2015, NeuroImage.

[7]  Nachiket Kapre,et al.  Communication Optimization of Iterative Sparse Matrix-Vector Multiply on GPUs and FPGAs , 2015, IEEE Transactions on Parallel and Distributed Systems.

[8]  James Demmel,et al.  Avoiding communication in sparse matrix computations , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[9]  Larry Carter,et al.  Rescheduling for Locality in Sparse Matrix Computations , 2001, International Conference on Computational Science.

[10]  Viktor Jirsa Neural field dynamics with local and global connectivity and time delay , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[11]  Jan Reineke,et al.  Ascertaining Uncertainty for Efficient Exact Cache Analysis , 2017, CAV.

[12]  N. Tzourio-Mazoyer,et al.  Automated Anatomical Labeling of Activations in SPM Using a Macroscopic Anatomical Parcellation of the MNI MRI Single-Subject Brain , 2002, NeuroImage.

[13]  Roland Potthast,et al.  Tutorial on Neural Field Theory , 2014 .

[14]  Lennaert van Veen,et al.  Open-source tools for dynamical analysis of Liley's mean-field cortex model , 2012, J. Comput. Sci..

[15]  Guang R. Gao,et al.  Locality Optimization of Stencil Applications Using Data Dependency Graphs , 2010, LCPC.

[16]  Larry Carter,et al.  Sparse Tiling for Stationary Iterative Methods , 2004, Int. J. High Perform. Comput. Appl..

[17]  Viktor K. Jirsa,et al.  Systematic approximations of neural fields through networks of neural masses in the virtual brain , 2013, NeuroImage.

[18]  Samuel Williams,et al.  Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2009, Parallel Comput..

[19]  E. Cuthill,et al.  Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.

[20]  Thomas Rauber,et al.  Parallel Low-Storage Runge—Kutta Solvers for ODE Systems with Limited Access Distance , 2011, Int. J. High Perform. Comput. Appl..

[21]  Katherine Yelick,et al.  Autotuning Sparse Matrix-Vector Multiplication for Multicore , 2012 .

[22]  Albert Cohen,et al.  Hybrid Hexagonal/Classical Tiling for GPUs , 2014, CGO '14.

[23]  Shoaib Kamil,et al.  Auto-tuning the Matrix Powers Kernel with SEJITS , 2012, VECPAR.

[24]  R Cameron Craddock,et al.  A whole brain fMRI atlas generated via spatially constrained spectral clustering , 2012, Human brain mapping.

[25]  H. V. Jayashree,et al.  Progress in Reversible Processor Design: A Novel Methodology for Reversible Carry Look-Ahead Adder , 2013, Trans. Comput. Sci..

[26]  Ulrich Rüde,et al.  Cache Optimization for Structured and Unstructured Grid Multigrid , 2000 .

[27]  Hamid R. Arabnia,et al.  Combined Integer and Variable Precision (CIVP) Floating Point Multiplication Architecture for FPGAs , 2007, PDPTA.

[28]  Dirk Roose,et al.  High-level strategies for parallel shared-memory sparse matrix – vector multiplication , 2012 .

[29]  Joseph S. B. Mitchell,et al.  The Discrete Geodesic Problem , 1987, SIAM J. Comput..

[30]  Larry Carter,et al.  An approach for code generation in the Sparse Polyhedral Framework , 2016, Parallel Comput..

[31]  Gerhard Wellein,et al.  LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments , 2010, 2010 39th International Conference on Parallel Processing Workshops.

[32]  Mary W. Hall,et al.  Non-affine Extensions to Polyhedral Code Generation , 2014, CGO '14.

[33]  P. Bressloff Spatiotemporal dynamics of continuum neural fields , 2012 .

[34]  Pierre L'Ecuyer,et al.  Random numbers for parallel computers: Requirements and methods, with emphasis on GPUs , 2015, Math. Comput. Simul..

[35]  Viktor K. Jirsa,et al.  How do parcellation size and short-range connectivity affect dynamics in large-scale brain network models? , 2016, NeuroImage.

[36]  Christophe Geuzaine,et al.  Gmsh: A 3‐D finite element mesh generator with built‐in pre‐ and post‐processing facilities , 2009 .

[37]  Samuel Williams,et al.  Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors , 2007, SIAM Rev..

[38]  Thom F. Oostendorp,et al.  Towards a model-based integration of co-registered electroencephalography/functional magnetic resonance imaging data with realistic neural population meshes , 2011, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[39]  Hamid R. Arabnia,et al.  A Reversible Programmable Logic Array (RPLA) Using Fredkin and Feynman Gates for Industrial Electronics and Applications , 2006, CDES.