Parallel Mutual Information Based Construction of Genome-Scale Networks on the Intel®Xeon Phi™ Coprocessor

Construction of whole-genome networks from large-scale gene expression data is an important problem in systems biology. While several techniques have been developed, most cannot handle network reconstruction at the whole-genome scale, and the few that can, require large clusters. In this paper, we present a solution on the Intel Xeon Phi coprocessor, taking advantage of its multi-level parallelism including many x86-based cores, multiple threads per core, and vector processing units. We also present a solution on the Intel® Xeon® processor. Our solution is based on TINGe, a fast parallel network reconstruction technique that uses mutual information and permutation testing for assessing statistical significance. We demonstrate the first ever inference of a plant whole genome regulatory network on a single chip by constructing a 15,575 gene network of the plant Arabidopsis thaliana from 3,137 microarray experiments in only 22 minutes. In addition, our optimization for parallelizing mutual information computation on the Intel Xeon Phi coprocessor holds out lessons that are applicable to other domains.

[1]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[2]  Nick Barnes,et al.  Speeding up Mutual Information Computation Using NVIDIA CUDA Hardware , 2007, 9th Biennial Conference of the Australian Pattern Recognition Society on Digital Image Computing Techniques and Applications (DICTA 2007).

[3]  Moon,et al.  Estimation of mutual information using kernel density estimators. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[4]  Weiguo Liu,et al.  Parallel mutual information estimation for inferring gene regulatory networks on GPUs , 2011, BMC Research Notes.

[5]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[6]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  Thomas M. Cover,et al.  Elements of information theory (2. ed.) , 2006 .

[8]  David Correa Martins,et al.  Accelerating gene regulatory networks inference through GPU/CUDA programming , 2012, 2012 IEEE 2nd International Conference on Computational Advances in Bio and medical Sciences (ICCABS).

[9]  Takuji Nishimura,et al.  Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.

[10]  D. di Bernardo,et al.  How to infer gene networks from expression profiles , 2007, Molecular systems biology.

[11]  Srinivas Aluru,et al.  Parallel Bayesian network structure learning with application to gene networks , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  G. Marsaglia Random numbers fall mainly in the planes. , 1968, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Carlos A. Coello Coello,et al.  A GPU-Based Implementation of Differential Evolution for Solving the Gene Regulatory Network Model Inference Problem , 2012 .

[14]  Srinivas Aluru,et al.  Parallel Information-Theory-Based Construction of Genome-Wide Gene Regulatory Networks , 2010, IEEE Transactions on Parallel and Distributed Systems.

[15]  M. Aluru,et al.  Reverse engineering and analysis of large genome-scale gene networks , 2012, Nucleic acids research.

[16]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[17]  A. Maritan,et al.  Using the principle of entropy maximization to infer genetic interaction networks from gene expression patterns , 2006, Proceedings of the National Academy of Sciences.

[18]  Qingqiu Gong,et al.  An Arabidopsis gene network based on the graphical Gaussian model. , 2007, Genome research.

[19]  Jesper Tegnér,et al.  Reverse engineering gene networks using singular value decomposition and robust regression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[20]  P. Bühlmann,et al.  Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana , 2004, Genome Biology.

[21]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[22]  Gérard G. Medioni,et al.  Mutual information computation and maximization using GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[23]  Masao Nagasaki,et al.  Estimating Genome-Wide Gene Networks Using Nonparametric Bayesian Network Models on Massively Parallel Computers , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[24]  R. Spielman,et al.  expression reveals gene interactions and functions Coexpression network based on natural variation in human gene Material , 2009 .

[25]  V. Anne Smith,et al.  Using Bayesian Network Inference Algorithms to Recover Molecular Genetic Regulatory Networks , 2002 .

[26]  Jim Jeffers Intel® Xeon Phi™ Coprocessors , 2013 .

[27]  Julie A. Dickerson,et al.  Arabidopsis gene co-expression network and its functional modules , 2009, BMC Bioinformatics.

[28]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[29]  Adam A. Margolin,et al.  Reverse engineering of regulatory networks in human B cells , 2005, Nature Genetics.

[30]  Carsten O. Daub,et al.  Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data , 2004, BMC Bioinformatics.