CNNC: Convolutional Neural Networks for Co-Expression Analysis

Co-expression analysis has been extensively used in genomics studies and tools for over two decades. To date, most methods for such analysis are unsupervised and symmetric. Such methods cannot infer causality and are prone to both overfitting and false negatives resulting from differences between cells in bulk studies. Here we present a new, supervised method based on convolutional neural networks (CNNs) for co-expression analysis. We use a normalized histogram image of gene pair co-expression as the input to the CNN. Testing our method on several co-expression prediction tasks we show that it outperforms prior methods and that scRNA-Seq data leads to more accurate results when compared to bulk data. The method can be directly extended to integrate sequence and epigenetic data and to infer causal relationships. Supporting website with software and data: https://github.com/xiaoyeye/CNNC.

[1]  Antonio Reverter,et al.  Combining partial correlation and an information theory approach to the reversed engineering of gene co-expression networks , 2008, Bioinform..

[2]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[3]  Saurabh Sinha,et al.  On counting position weight matrix matches in a sequence, with application to discriminative motif finding , 2006, ISMB.

[4]  N. Navin,et al.  Advances and applications of single-cell sequencing technologies. , 2015, Molecular cell.

[5]  Lin Song,et al.  Comparison of co-expression measures: mutual information, correlation, and model based indices , 2012, BMC Bioinformatics.

[6]  Max Kotlyar,et al.  Spearman Correlation Identifies Statistically Significant Gene Expression Clusters in Spinal Cord Development and Injury , 2002, Neurochemical Research.

[7]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[8]  Patrik D'haeseleer,et al.  Genetic network inference: from co-expression clustering to reverse engineering , 2000, Bioinform..

[9]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[10]  Fatih Ozsolak,et al.  RNA sequencing: advances, challenges and opportunities , 2011, Nature Reviews Genetics.

[11]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[12]  Holger Karas,et al.  TRANSFAC: a database on transcription factors and their DNA binding sites , 1996, Nucleic Acids Res..

[13]  Evan O. Paull,et al.  Inferring causal molecular networks: empirical assessment through a community-based effort , 2016, Nature Methods.

[14]  B. Snel,et al.  Predicting gene function by conserved co-expression. , 2003, Trends in genetics : TIG.

[15]  Debora S. Marks,et al.  MicroRNA control of protein expression noise , 2015, Science.

[16]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[17]  Gabriele Sales,et al.  graphite - a Bioconductor package to convert pathway topology to gene network , 2012, BMC Bioinformatics.

[18]  Adam A. Margolin,et al.  Reverse engineering of regulatory networks in human B cells , 2005, Nature Genetics.

[19]  M. Waterman,et al.  Gene coexpression measures in large heterogeneous samples using count statistics , 2014, Proceedings of the National Academy of Sciences.

[20]  Tommi S. Jaakkola,et al.  Fast optimal leaf ordering for hierarchical clustering , 2001, ISMB.

[21]  Doron Lancet,et al.  MalaCards: an amalgamated human disease compendium with diverse clinical and genetic annotation and structured search , 2016, Nucleic Acids Res..

[22]  Carsten O. Daub,et al.  Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data , 2004, BMC Bioinformatics.

[23]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[24]  S. Horvath,et al.  Integrative Functional Genomic Analyses Implicate Specific Molecular Pathways and Circuits in Autism , 2013, Cell.

[25]  J. Massagué TGF-beta signal transduction. , 1998, Annual review of biochemistry.

[26]  Jie Wang,et al.  Single-Cell Co-expression Analysis Reveals Distinct Functional Modules, Co-regulation Mechanisms and Clinical Outcomes , 2016, PLoS Comput. Biol..

[27]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[28]  Harald Binder,et al.  Translating bioinformatics in oncology: guilt-by-profiling analysis and identification of KIF18B and CDCA3 as novel driver genes in carcinogenesis , 2015, Bioinform..

[29]  Wei Niu,et al.  Coexpression Networks Implicate Human Midfetal Deep Cortical Projection Neurons in the Pathogenesis of Autism , 2013, Cell.

[30]  Claudia Angelini,et al.  Understanding gene regulatory mechanisms by integrating ChIP-seq and RNA-seq data: statistical solutions to biological problems , 2014, Front. Cell Dev. Biol..

[31]  M. Gerstein,et al.  Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions. , 2001, Journal of molecular biology.

[32]  Riet De Smet,et al.  Advantages and limitations of current network inference methods , 2010, Nature Reviews Microbiology.

[33]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[34]  David J. Arenillas,et al.  JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework , 2017, Nucleic acids research.

[35]  Sapna Kumari,et al.  Evaluation of Gene Association Methods for Coexpression Network Construction and Biological Knowledge Discovery , 2012, PloS one.

[36]  Atul J. Butte,et al.  Unsupervised knowledge discovery in medical databases using relevance networks , 1999, AMIA.

[37]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[38]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[39]  Ziv Bar-Joseph,et al.  Reconstructing dynamic microRNA-regulated interaction networks , 2013, Proceedings of the National Academy of Sciences.

[40]  Ziv Bar-Joseph,et al.  scQuery: a web server for comparative analysis of single-cell RNA-seq data , 2018, bioRxiv.

[41]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[42]  Peter Spirtes,et al.  Causal discovery and inference: concepts and recent methodological advances , 2016, Applied Informatics.

[43]  Alexander E. Kel,et al.  GTRD: a database of transcription factor binding sites identified by ChIP-seq experiments , 2016, Nucleic Acids Res..

[44]  Z. Bar-Joseph,et al.  Using neural networks for reducing the dimensions of single-cell RNA-Seq data , 2017, Nucleic acids research.

[45]  Jesse Gillis,et al.  Co-expression in Single-Cell Analysis: Saving Grace or Original Sin? , 2018, Trends in genetics : TIG.

[46]  Sara Ballouz,et al.  Exploiting single-cell expression to characterize co-expression replicability , 2016, Genome Biology.

[47]  Z. Bar-Joseph,et al.  Linking the signaling cascades and dynamic regulatory networks controlling stress responses , 2013, Genome research.

[48]  Jonathan H. Young,et al.  Predictability of Genetic Interactions from Functional Gene Modules , 2016, G3: Genes, Genomes, Genetics.

[49]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[50]  A. Regev,et al.  Revealing the vectors of cellular identity with single-cell genomics , 2016, Nature Biotechnology.

[51]  Minoru Kanehisa,et al.  KEGG: new perspectives on genomes, pathways, diseases and drugs , 2016, Nucleic Acids Res..

[52]  G. Ghosh,et al.  Crystal structure of p50/p65 heterodimer of transcription factor NF-κB bound to DNA , 1998, Nature.

[53]  T. Mikkelsen,et al.  Dynamics of lineage commitment revealed by single-cell transcriptomics of differentiating embryonic stem cells , 2016, Nature Communications.

[54]  João Pedro de Magalhães,et al.  Gene co-expression analysis for functional classification and gene–disease predictions , 2017, Briefings Bioinform..

[55]  Shane J. Neph,et al.  A comparative encyclopedia of DNA elements in the mouse genome , 2014, Nature.

[56]  Peter Bühlmann,et al.  Predicting causal effects in large-scale systems from observational data , 2010, Nature Methods.

[57]  Terence P. Speed,et al.  Systematic noise degrades gene co-expression signals but can be corrected , 2015, BMC Bioinformatics.

[58]  Thalia E. Chan,et al.  Gene Regulatory Network Inference from Single-Cell Data Using Multivariate Information Measures , 2016, bioRxiv.

[59]  Hongzhe Li,et al.  A Markov random field model for network-based analysis of genomic data , 2007, Bioinform..

[60]  S. Oliver Proteomics: Guilt-by-association goes global , 2000, Nature.

[61]  Sören Müller,et al.  Single-cell Co-expression Subnetwork Analysis , 2017, Scientific Reports.

[62]  Ziv Bar-Joseph,et al.  DREM 2.0: Improved reconstruction of dynamic regulatory networks from time-series expression data , 2012, BMC Systems Biology.

[63]  Ziv Bar-Joseph,et al.  A web server for comparative analysis of single-cell RNA-seq data , 2018, Nature Communications.

[64]  Homin K. Lee,et al.  Coexpression analysis of human genes across many microarray data sets. , 2004, Genome research.

[65]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.