Out-of-Core GPU-Accelerated Causal Structure Learning

Learning the causal structures in high-dimensional datasets enables deriving advanced insights from observational data. For example, the construction of gene regulatory networks inferred from gene expression data supports solving biological and biomedical problems, such as, in drug design or diagnostics. With the adoption of Graphics Processing Units (GPUs) the runtime of constraint-based causal structure learning algorithms on multivariate normal distributed data is significantly reduced. For extremely high-dimensional datasets, e.g., provided by The Cancer Genome Atlas (TCGA), state-of-the-art GPU-accelerated algorithms hit the device memory limit of single GPUs and consequently, execution fails. In order to overcome this limitation, we propose an out-of-core algorithm for GPU-accelerated constraint-based causal structure learning on multivariate normal distributed data. We experimentally validate the scalability of our algorithm, beyond GPU device memory capacities and compare our implementation to a baseline using Unified Memory (UM). In recent GPU generations, UM overcomes the device memory limit, by utilizing the GPU page migration engine. On a real-world gene expression dataset from the TCGA, our approach outperforms the baseline by a factor of 95 and is faster than a parallel Central Processing Unit (CPU)-based version by a factor of 236.

[1]  Joseph JáJá,et al.  Achieving Native GPU Performance for Out-of-Card Large Dense Matrix Multiplication , 2016, Parallel Process. Lett..

[2]  Thomas S. Richardson,et al.  A Discovery Algorithm for Directed Cyclic Graphs , 1996, UAI.

[3]  Satoshi Matsuoka,et al.  DRAGON: Breaking GPU Memory Capacity Limits with Direct NVM Access , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[4]  Raphael Landaverde,et al.  An investigation of Unified Memory Access performance in CUDA , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[5]  Marco Scutari,et al.  Bayesian Network Constraint-Based Structure Learning Algorithms: Parallel and Optimised Implementations in the bnlearn R Package , 2014, ArXiv.

[6]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[7]  David W. Nellans,et al.  Towards high performance paged memory for GPUs , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[8]  Peter Bühlmann,et al.  Causal Inference Using Graphical Models with the R Package pcalg , 2012 .

[9]  Grégory Nuel,et al.  Joint estimation of causal effects from observational and intervention gene expression data , 2013, BMC Systems Biology.

[10]  Anders L. Madsen,et al.  A parallel algorithm for Bayesian network structure learning from large data sets , 2017, Knowl. Based Syst..

[11]  David Maxwell Chickering,et al.  Learning Equivalence Classes of Bayesian Network Structures , 1996, UAI.

[12]  Matin Hashemi,et al.  cuPC: CUDA-Based Parallel PC Algorithm for Causal Structure Learning on GPU , 2018, IEEE Transactions on Parallel and Distributed Systems.

[13]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[14]  Anthony C. Davison,et al.  High-Dimensional Bayesian Clustering with Variable Selection: The R Package bclust , 2012 .

[15]  Matthias Uflacker,et al.  Load-Balanced Parallel Constraint-Based Causal Structure Learning on Multi-Core Systems for High-Dimensional Data , 2019, CD@KDD.

[16]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[17]  Matthias Uflacker,et al.  Order-independent constraint-based causal structure learning for gaussian distribution models using GPUs , 2018, SSDBM.

[18]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[19]  Peter Spirtes,et al.  Introduction to Causal Inference , 2010, J. Mach. Learn. Res..

[20]  Jiuyong Li,et al.  A Fast PC Algorithm for High Dimensional Causal Discovery with Multi-Core PCs , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  Thomas S. Richardson,et al.  Learning high-dimensional DAGs with latent and selection variables (Abstract) , 2011, UAI.

[22]  Toshio Endo,et al.  Realizing Out-of-Core Stencil Computations Using Multi-tier Memory Hierarchy on GPGPU Clusters , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).

[23]  Jack J. Dongarra,et al.  A Framework for Out of Memory SVD Algorithms , 2017, ISC.

[24]  David Maxwell Chickering,et al.  Large-Sample Learning of Bayesian Networks is NP-Hard , 2002, J. Mach. Learn. Res..

[25]  Diego Colombo,et al.  Order-independent constraint-based causal structure learning , 2012, J. Mach. Learn. Res..

[26]  Matthias Uflacker,et al.  Integrative Gene Selection on Gene Expression Data: Providing Biological Context to Traditional Approaches , 2018, J. Integr. Bioinform..

[27]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.