GPU Implementation of Pairwise Gaussian Mixture Models for Multi-Modal Gene Co-Expression Networks

Gene co-expression networks (GCNs) are widely used in bioinformatics research to perform system-level analyses of organisms based on the pairwise correlation between all expressed genes. For large datasets which contain samples from multiple sources, gene pairs can exhibit multiple modes of co-expression which confound typical correlation approaches. A clustering method such as Gaussian Mixture Models (GMMs) may be used to separate the modes of each gene pair in an unsupervised manner, prior to computing the correlation of each mode. However, pairwise clustering significantly increases the computational cost of constructing a GCN, as several clustering models must be evaluated for each gene pair, and the number of gene pairs grows rapidly with the number of genes. In this paper, we present a heterogeneous, high-throughput multi-CPU/GPU software package for multi-modal GCN construction, implemented in version 3 of the Knowledge Independent Network Construction (KINC) software. We determine the optimal values for several execution parameters of the GPU implementation, and we benchmark our CPU and GPU implementations for up to 8 CPUs/GPUs. Our GPU implementation achieves a 167 $\times$ speedup over the corresponding CPU implementation, as well as a 500 $\times$ speedup over KINCv1.

[1]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[2]  C. dePamphilis,et al.  Co-expression networks provide insights into molecular mechanisms of postharvest temperature modulation of apple fruit to reduce superficial scald , 2019, Postharvest Biology and Technology.

[3]  Ettore Napoli,et al.  ASIC and FPGA Implementation of the Gaussian Mixture Model Algorithm for Real-Time Segmentation of High Definition Video , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[4]  Yuliya Tarabalka,et al.  Real-time anomaly detection in hyperspectral images using multivariate normal mixture models and GPU processing , 2009, Journal of Real-Time Image Processing.

[5]  Zoran Zivkovic,et al.  Improved adaptive Gaussian mixture model for background subtraction , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[6]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Wayne Luk,et al.  A fully-pipelined expectation-maximization engine for Gaussian Mixture Models , 2012, 2012 International Conference on Field-Programmable Technology.

[8]  F Alex Feltus,et al.  Discovering Condition-Specific Gene Co-Expression Patterns Using Gaussian Mixture Models: A Cancer Case Study , 2017, Scientific Reports.

[9]  William L. Poehlman,et al.  Linking Binary Gene Relationships to Drivers of Renal Cell Carcinoma Reveals Convergent Function in Alternate Tumor Progression Paths , 2019, Scientific Reports.

[10]  D. Rubin,et al.  Estimation and Hypothesis Testing in Finite Mixture Models , 1985 .

[11]  Steven F. Quigley,et al.  FPGA Implementation for GMM-Based Speaker Identification , 2011, Int. J. Reconfigurable Comput..

[12]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[13]  Dennis B. Troup,et al.  NCBI GEO: archive for functional genomics data sets—10 years on , 2010, Nucleic Acids Res..

[14]  Frederick C. Harris,et al.  petal: Co-expression network modelling in R , 2016, BMC Systems Biology.

[15]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[16]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[17]  Hoai Bac Le,et al.  GPU Implementation of Extended Gaussian Mixture Model for Background Subtraction , 2010, 2010 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future (RIVF).

[18]  Gérard Govaert,et al.  Model-based cluster and discriminant analysis with the MIXMOD software , 2006, Comput. Stat. Data Anal..

[19]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[20]  Stephen P. Ficklin,et al.  Massive-Scale Gene Co-Expression Network Construction and Robustness Testing Using Random Matrix Theory , 2013, PloS one.

[21]  Futao Zhang,et al.  FastGCN: A GPU Accelerated Tool for Fast Gene Co-Expression Networks , 2015, PloS one.

[22]  Richard Bonneau,et al.  EGRINs (Environmental Gene Regulatory Influence Networks) in Rice That Function in the Response to Water Deficit, High Temperature, and Agricultural Environments[OPEN] , 2016, Plant Cell.

[23]  Lin Song,et al.  Comparison of co-expression measures: mutual information, correlation, and model based indices , 2012, BMC Bioinformatics.