Optimization and parallelization of tensor andODE/PDE computations on GPU

We propose a multi-level GPU-based parallelization algorithm to solve the multi-compartment Hodgkin Huxley (HH) model equation that requires solving the Hines matrix. We use a ‘parallel-in-time’ algorithm (like the Parareal strategy) for obtaining outer level parallelism, and an Exact Domain Decomposition (EDD) algorithm with fine-decomposition for inner-level parallelism. We show that our technique can also be applied to any differential equation like the heat equations which induce tridiagonal systems. Typically, a solution to the HH equation runs for hundreds to tens of thousands of time-steps while solving a Hines matrix at each time step. Previous solutions by Michael Mascagni et al. (1991) and Hines et al. (2008) to this problem have tackled only solving the Hines matrix in parallel. Our approach uses the dynamic parallelism of CUDA to achieve multi-level parallelism on GPUs. Our solution outperforms the sequential time method on standard neuron morphologies upto 2.5x. We also show that iterative part of parareal method converges in 5-7 iterations on average with an accuracy of 10−6. We also propose a GPU optimization for the Higher Order Tensor Renormalization Group problem, where the tensor contraction operations inside HOTRG is optimized by a multi- GPU implementation using cuBLAS xt API.

[1]  Gary Smith,et al.  High-Level Synthesis: Past, Present, and Future , 2009, IEEE Design & Test of Computers.

[2]  Michael L. Hines,et al.  Neuron splitting in compute-bound parallel network simulations enables runtime scaling with twice as many processors , 2008, Journal of Computational Neuroscience.

[3]  Peter Dayan,et al.  Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems , 2001 .

[4]  A. Hodgkin,et al.  A quantitative description of membrane current and its application to conduction and excitation in nerve , 1952, The Journal of physiology.

[5]  Wolfgang Hackbusch,et al.  Parabolic multi-grid methods , 1985 .

[6]  Dharma Teja Vooturi,et al.  Parallelizing Hines Matrix Solver in Neuron Simulations on GPU , 2017, 2017 IEEE 24th International Conference on High Performance Computing (HiPC).

[7]  Z. Y. Xie,et al.  Coarse-graining renormalization by higher-order singular value decomposition , 2012, 1201.1144.

[8]  Harold S. Stone,et al.  Parallel Tridiagonal Equation Solvers , 1975, TOMS.

[9]  White,et al.  Density matrix formulation for quantum renormalization groups. , 1992, Physical review letters.

[10]  Z. Y. Xie,et al.  Second renormalization of tensor-network states. , 2008, Physical review letters.

[11]  Véronique Martin,et al.  An optimized Schwarz waveform relaxation method for the unsteady convection diffusion equation in two dimensions , 2004 .

[12]  Yu-An Chen,et al.  Density matrix renormalization group , 2014 .

[13]  Richard M. Fujimoto,et al.  Parallel event-driven neural network simulations using the Hodgkin-Huxley neuron model , 2005, Workshop on Principles of Advanced and Distributed Simulation (PADS'05).

[14]  Rolf Krause,et al.  A space-time parallel solver for the three-dimensional heat equation , 2013, PARCO.

[15]  Philippe Coussy,et al.  High-Level Synthesis: from Algorithm to Digital Circuit , 2008 .

[16]  Charbel Farhat,et al.  A time‐parallel implicit method for accelerating the solution of non‐linear structural dynamics problems , 2009 .

[17]  Michael L. Hines,et al.  Fully implicit parallel simulation of single neurons , 2008, Journal of Computational Neuroscience.

[18]  Michael Levin,et al.  Tensor renormalization group approach to two-dimensional classical lattice models. , 2006, Physical review letters.

[19]  M. Hines,et al.  Efficient computation of branched nerve equations. , 1984, International journal of bio-medical computing.

[20]  Louis W. Ehrlich,et al.  A Numerical Method of Solving a Heat Flow Problem with Moving Boundary , 1958, JACM.

[21]  Stefan Güttel A Parallel Overlapping Time-Domain Decomposition Method for ODEs , 2013, Domain Decomposition Methods in Science and Engineering XX.

[22]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[23]  Ahmed H. Sameh,et al.  A parallel hybrid banded system solver: the SPIKE algorithm , 2006, Parallel Comput..

[24]  Martin J. Gander,et al.  Nonlinear Convergence Analysis for the Parareal Algorithm , 2008 .

[25]  Michael Mascagni,et al.  A parallelizing algorithm for computing solutions to arbitrarily branched cable neuron models , 1991, Journal of Neuroscience Methods.

[26]  Michael L. Minion,et al.  TOWARD AN EFFICIENT PARALLEL IN TIME METHOD FOR PARTIAL DIFFERENTIAL EQUATIONS , 2012 .

[27]  Josep-Lluís Larriba-Pey,et al.  An Analysis of the Parallel Computation of Arbitrarily Branched Cable Neuron Models , 1995, PPSC.

[28]  Alon Korngreen,et al.  Accelerating compartmental modeling on a graphical processing unit , 2013, Front. Neuroinform..

[29]  Jürg Nievergelt,et al.  Parallel methods for integrating ordinary differential equations , 1964, CACM.

[30]  W. Miranker,et al.  Parallel methods for the numerical integration of ordinary differential equations , 1967 .

[31]  Colin B. Macdonald,et al.  Parallel High-Order Integrators , 2010, SIAM J. Sci. Comput..

[32]  Harold S. Stone,et al.  An Efficient Parallel Algorithm for the Solution of a Tridiagonal Linear System of Equations , 1973, JACM.

[33]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[34]  Ahmed Sameh,et al.  SPIKE: A parallel environment for solving banded linear systems , 2007 .

[35]  Rolf Krause,et al.  A massively space-time parallel N-body solver , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.