MPIGeneNet: Parallel Calculation of Gene Co-Expression Networks on Multicore Clusters

In this work, we present <italic>MPIGeneNet</italic>, a parallel tool that applies Pearson's correlation and Random Matrix Theory to construct gene co-expression networks. It is based on the state-of-the-art sequential tool <italic>RMTGeneNet</italic>, which provides networks with high robustness and sensitivity at the expenses of relatively long runtimes for large scale input datasets. <italic>MPIGeneNet</italic> returns the same results as <italic> RMTGeneNet</italic> but improves the memory management, reduces the I/O cost, and accelerates the two most computationally demanding steps of co-expression network construction by exploiting the compute capabilities of common multicore CPU clusters. Our performance evaluation on two different systems using three typical input datasets shows that <italic>MPIGeneNet</italic> is significantly faster than <italic>RMTGeneNet</italic>. As an example, our tool is up to 175.41 times faster on a cluster with eight nodes, each one containing two 12-core Intel Haswell processors. The source code of <italic>MPIGeneNet</italic>, as well as a reference manual, are available at <uri> https://sourceforge.net/projects/mpigenenet/</uri>.

[1]  Tony Pan,et al.  Parallel Pairwise Correlation Computation on Intel Xeon Phi Clusters , 2016, 2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).

[2]  Hang Zhang,et al.  TF-Cluster: A pipeline for identifying functionally coordinated transcription factors via network decomposition of the shared coexpression connectivity matrix (SCCM) , 2011, BMC Systems Biology.

[3]  Katherine A. Yelick,et al.  A Communication-Optimal N-Body Algorithm for Direct Interactions , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[4]  G. Wainrib,et al.  Topological and dynamical complexity of random neural networks. , 2012, Physical review letters.

[5]  Srinivas Aluru,et al.  Parallel Mutual Information Based Construction of Genome-Scale Networks on the Intel®Xeon Phi™ Coprocessor , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  F. Feltus,et al.  Gene Coexpression Network Alignment and Conservation of Gene Modules between Two Grass Species: Maize and Rice[C][W][OA] , 2011, Plant Physiology.

[7]  A. Onatski Asymptotics of the principal components estimator of large factor models with weakly influential factors , 2012 .

[8]  F Alex Feltus,et al.  The Association of Multiple Interacting Genes with Specific Phenotypes in Rice Using Gene Coexpression Networks1[C][W][OA] , 2010, Plant Physiology.

[9]  Bertil Schmidt,et al.  ParDRe: faster parallel duplicated reads removal tool for sequencing studies , 2016, Bioinform..

[10]  Hanxiang Peng,et al.  Consistency and asymptotic distribution of the Theil–Sen estimator , 2008 .

[11]  Weiguo Liu,et al.  Parallel mutual information estimation for inferring gene regulatory networks on GPUs , 2011, BMC Research Notes.

[12]  R. Spielman,et al.  expression reveals gene interactions and functions Coexpression network based on natural variation in human gene Material , 2009 .

[13]  Stephen P. Ficklin,et al.  Massive-Scale Gene Co-Expression Network Construction and Robustness Testing Using Random Matrix Theory , 2013, PloS one.

[14]  Michael Griffin,et al.  Gene co-expression network topology provides a framework for molecular characterization of cellular state , 2004, Bioinform..

[15]  Futao Zhang,et al.  FastGCN: A GPU Accelerated Tool for Fast Gene Co-Expression Networks , 2015, PloS one.

[16]  C. Beenakker,et al.  Random-matrix theory of Majorana fermions and topological superconductors , 2014, 1407.2131.

[17]  Stephen P. Ficklin,et al.  A Systems-Genetics Approach and Data Mining Tool to Assist in the Discovery of Genes Underlying Complex Traits in Oryza sativa , 2013, PloS one.

[18]  Mengxia Zhu,et al.  GPU Accelerated Microarray Data Analysis Using Random Matrix Theory , 2011, 2011 IEEE International Conference on High Performance Computing and Communications.

[19]  Katherine A. Yelick,et al.  Communication avoiding and overlapping for numerical linear algebra , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[20]  Staffan Persson,et al.  Identification of genes required for cellulose synthesis by regression analysis of public microarray data sets. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[21]  María J. Martín,et al.  Fast Parallel Construction of Correlation Similarity Matrices for Gene Co-Expression Networks on Multicore Clusters , 2017, ICCS.

[22]  J. Dopazo,et al.  Assessing the Biological Significance of Gene Expression Signatures and Co-Expression Modules by Studying Their Network Properties , 2011, PloS one.

[23]  Philippe Loubaton,et al.  A subspace estimator for fixed rank perturbations of large random matrices , 2011, J. Multivar. Anal..

[24]  Qishi Wu,et al.  Transcription network construction for large-scale microarray datasets using a high-performance computing approach , 2008, BMC Genomics.

[25]  Nidhi Rawat,et al.  Construction of citrus gene coexpression networks from microarray data using random matrix theory , 2015, Horticulture Research.

[26]  Michael A. Langston,et al.  Threshold selection in gene co-expression networks using spectral graph theory techniques , 2009, BMC Bioinformatics.

[27]  Torsten Hoefler,et al.  Remote Memory Access Programming in MPI-3 , 2015, TOPC.

[28]  Yongchao Liu,et al.  MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems , 2016, Bioinform..

[29]  Sameer Kumar,et al.  Scalable MPI-3.0 RMA on the Blue Gene/Q Supercomputer , 2014, EuroMPI/ASIA.

[30]  Carlos D. Barranco,et al.  Incorporating biological knowledge for construction of fuzzy networks of gene associations , 2016, Appl. Soft Comput..

[31]  Jaume Bacardit,et al.  Functional Network Construction in Arabidopsis Using Rule-Based Machine Learning on Large-Scale Data Sets[C][W][OA] , 2011, Plant Cell.

[32]  Feng Luo,et al.  Constructing gene co-expression networks and predicting functions of unknown genes by random matrix theory , 2007, BMC Bioinformatics.

[33]  Giuseppe Jurman,et al.  A null model for Pearson coexpression networks , 2013, bioRxiv.

[34]  Srinivas Aluru,et al.  Parallel Information-Theory-Based Construction of Genome-Wide Gene Regulatory Networks , 2010, IEEE Transactions on Parallel and Distributed Systems.