More accurate models for detecting gene-gene interactions from public expression compendia

The fast accumulation of public gene expression data—made possible by high-throughput technology—provides us an unprecedented opportunity to identify functionally related genes by analyzing their co-expression patterns. However, these data are typically noisy and highly heterogeneous, complicating their use in constructing co-expression network in large expression compendia. Previous studies suggested that the collective gene expression pattern can be better modeled by Gaussian mixtures. This motivates our present work, which proposes Multimodal Mutual Information (MMI) to reconstruct gene co-expression network from public gene expression data. MMI assumes gene pair following bivariate Gaussian mixture models and categories the samples into unique bins with respect to their expression magnitude. Two kinds of correlations in MMI are computed and aggregated to capture both discretized dependency and the expression correlation for each bin. Through extensive simulations, MMI outperforms other approaches with respect to calculating gene-gene interactions, regardless of the level of noise or strength of interactions. The advance of MMI is further validated by three real problems: 1. Infer novel gene functions by their connections with the well-documented genes. We apply principle component analysis to the correlated matrix generated by MMI and Pearson correlation and construct transcriptional components to evaluate gene-gene interactions. MMI enables 1.7 times more eligible transcriptional components than Pearson correlation, which can be used to predict gene functions. 2. Prioritize candidate genes for an affected pedigree. MMI calculates the interactions between candidate and disease established genes and explores KIF1A as the new causal gene of pure hereditary spastic paraparesis. 3. Detect disease “hot genes”. MMI identifies ANK2 as the “hot gene” for autism spectrum disorders by evaluating its co-expression with other disease susceptible genes derived from trio-based exome sequencing data.

[1]  Yi Li,et al.  A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues , 2013, BMC Bioinformatics.

[2]  Carsten O. Daub,et al.  Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data , 2004, BMC Bioinformatics.

[3]  João Pedro de Magalhães,et al.  GeneFriends: a human RNA-seq-based gene and transcript co-expression database , 2014, Nucleic Acids Res..

[4]  M. Waterman,et al.  Gene coexpression measures in large heterogeneous samples using count statistics , 2014, Proceedings of the National Academy of Sciences.

[5]  Pere Caminal,et al.  MISS: a non-linear methodology based on mutual information for genetic association studies in both population and sib-pairs analysis , 2010, Bioinform..

[6]  A. Ballabio,et al.  Identification of microRNA-regulated gene networks by expression analysis of target genes , 2012, Genome research.

[7]  Robert Petryszak,et al.  ArrayExpress update—simplifying data submissions , 2014, Nucleic Acids Res..

[8]  V. Bennett,et al.  Distinct ankyrin isoforms at neuron cell bodies and nodes of Ranvier resolved using erythrocyte ankyrin-deficient mice , 1991, The Journal of cell biology.

[9]  Boris Yamrom,et al.  The contribution of de novo coding mutations to autism spectrum disorder , 2014, Nature.

[10]  Homin K. Lee,et al.  Coexpression analysis of human genes across many microarray data sets. , 2004, Genome research.

[11]  Ju Han Kim,et al.  Mixture-model based estimation of gene expression variance from public database improves identification of differentially expressed genes in small sized microarray data , 2009, Bioinform..

[12]  C. Wijmenga,et al.  Gene expression analysis identifies global gene dosage sensitivity in cancer , 2015, Nature Genetics.

[13]  A. Ballabio,et al.  MicroRNA target prediction by expression analysis of host genes. , 2009, Genome research.

[14]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[15]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Peter J. Woolf,et al.  Learning transcriptional regulatory networks from high throughput gene expression data using continuous three-way mutual information , 2008, BMC Bioinformatics.

[17]  Olga G. Troyanskaya,et al.  A scalable method for integration and functional analysis of multiple microarray datasets , 2006, Bioinform..

[18]  Kathryn Roeder,et al.  Integrated Model of De Novo and Inherited Genetic Variants Yields Greater Power to Identify Risk Genes , 2013, PLoS genetics.

[19]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[20]  E. Banks,et al.  De novo mutations in schizophrenia implicate synaptic networks , 2014, Nature.

[21]  Gianluca Bontempi,et al.  minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information , 2008, BMC Bioinformatics.

[22]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[23]  Yaniv Erlich,et al.  Exome sequencing and disease-network analysis of a single family implicate a mutation in KIF1A in hereditary spastic paraparesis. , 2011, Genome research.

[24]  Wei Niu,et al.  Coexpression Networks Implicate Human Midfetal Deep Cortical Projection Neurons in the Pathogenesis of Autism , 2013, Cell.

[25]  J. Troge,et al.  Tumour evolution inferred by single-cell sequencing , 2011, Nature.

[26]  R. Tibshirani,et al.  Comment on "Detecting Novel Associations In Large Data Sets" by Reshef Et Al, Science Dec 16, 2011 , 2014, 1401.7645.

[27]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[28]  Christopher S. Poultney,et al.  Synaptic, transcriptional, and chromatin genes disrupted in autism , 2014, Nature.