GPU code generation for ODE-based applications with phased shared-data access patterns

We present a novel code generation scheme for GPUs. Its key feature is the platform-aware generation of a heterogeneous pool of threads. This exposes more data-sharing opportunities among the concurrent threads and reduces the memory requirements that would otherwise exceed the capacity of the on-chip memory. Instead of the conventional strategy of focusing on exposing as much parallelism as possible, our scheme leverages on the phased nature of memory access patterns found in many applications that exhibit massive parallelism. We demonstrate the effectiveness of our code generation strategy on a computational systems biology application. This application consists of computing a Dynamic Bayesian Network (DBN) approximation of the dynamics of signalling pathways described as a system of Ordinary Differential Equations (ODEs). The approximation algorithm involves (i) sampling many (of the order of a few million) times from the set of initial states, (ii) generating trajectories through numerical integration, and (iii) storing the statistical properties of this set of trajectories in Conditional Probability Tables (CPTs) of a DBN via a prespecified discretization of the time and value domains. The trajectories can be computed in parallel. However, the intermediate data needed for computing them, as well as the entries for the CPTs, are too large to be stored locally. Our experiments show that the proposed code generation scheme scales well, achieving significant performance improvements on three realistic signalling pathways models. These results suggest how our scheme could be extended to deal with other applications involving systems of ODEs.

[1]  François Bodin,et al.  Heterogeneous multicore parallel programming for graphics processing units , 2009, Sci. Program..

[2]  David Hsu,et al.  Probabilistic Approximations of Signaling Pathway Dynamics , 2009, CMSB.

[3]  Anjul Patney,et al.  Efficient computation of sum-products on GPUs through software-managed cache , 2008, ICS '08.

[4]  William J. Dally,et al.  The GPU Computing Era , 2010, IEEE Micro.

[5]  Scott A. Mahlke,et al.  Sponge: portable stream programming on graphics engines , 2011, ASPLOS XVI.

[6]  P. Glaskowsky NVIDIA ’ s Fermi : The First Complete GPU Computing Architecture , 2009 .

[7]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[8]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[9]  P. S. Thiagarajan,et al.  Approximate probabilistic analysis of biopathway dynamics , 2012, Bioinform..

[10]  David Hsu,et al.  A Computational and Experimental Study of the Regulatory Mechanisms of the Complement System , 2011, PLoS Comput. Biol..

[11]  Albert Goldbeter,et al.  Modeling the segmentation clock as a network of coupled oscillations in the Notch, Wnt and FGF signaling pathways. , 2008, Journal of theoretical biology.

[12]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[13]  Takuji Nishimura,et al.  Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.

[14]  Håkan L. S. Younes,et al.  Statistical probabilistic model checking with a focus on time-bounded properties , 2006, Inf. Comput..

[15]  Long Chen,et al.  Exploring Fine-Grained Task-Based Execution on Multi-GPU Systems , 2011, 2011 IEEE International Conference on Cluster Computing.

[16]  David Hsu,et al.  Probabilistic approximations of ODEs based bio-pathway dynamics , 2011, Theor. Comput. Sci..

[17]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[18]  Hui Feng,et al.  Compiler-directed scratchpad memory management via graph coloring , 2009, TACO.

[19]  Dongrui Fan,et al.  High performance comparison-based sorting algorithm on many-core GPUs , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[20]  Xingming Zhao,et al.  Computational Systems Biology , 2013, TheScientificWorldJournal.

[21]  H. Kitano,et al.  Computational systems biology , 2002, Nature.

[22]  Michael Wolfe,et al.  Implementing the PGI Accelerator model , 2010, GPGPU-3.

[23]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[24]  David Hsu,et al.  Statistical Model Checking Based Calibration and Analysis of Bio-pathway Models , 2013, CMSB.

[25]  H. Urakubo,et al.  Ca2+‐independent phospholipase A2‐dependent sustained Rho‐kinase activation exhibits all‐or‐none response , 2006, Genes to cells : devoted to molecular & cellular mechanisms.

[26]  Hyesoon Kim,et al.  An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.

[27]  Yao Zhang,et al.  A quantitative performance analysis model for GPU architectures , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[28]  D. Lauffenburger,et al.  Physicochemical modelling of cell signalling pathways , 2006, Nature Cell Biology.

[29]  Weng-Fai Wong,et al.  Automated Architecture-Aware Mapping of Streaming Applications Onto GPUs , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[30]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[31]  Nir Friedman,et al.  Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning , 2009 .

[32]  Cédric Bastoul,et al.  Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[33]  K. H. Lee,et al.  The statistical mechanics of complex signaling networks: nerve growth factor signaling , 2004, Physical biology.