Deep learning for inferring gene relationships from single-cell expression data

Several methods were developed to mine gene-gene relationships from expression data. Examples include correlation and mutual information methods for co-expression analysis, clustering and undirected graphical models for functional assignments and directed graphical models for pathway reconstruction. Using a novel encoding for gene expression data, followed by deep neural networks analysis, we present a framework that can successfully address all these diverse tasks. We show that our method, CNNC, improves upon prior methods in tasks ranging from predicting transcription factor targets to identifying disease related genes to causality inference. CNNC9s encoding provides insights about some of the decisions it makes and their biological basis. CNNC is flexible and can easily be extended to integrate additional types of genomics data leading to further improvements in its performance.

[1]  Evan O. Paull,et al.  Inferring causal molecular networks: empirical assessment through a community-based effort , 2016, Nature Methods.

[2]  M. Waterman,et al.  Gene coexpression measures in large heterogeneous samples using count statistics , 2014, Proceedings of the National Academy of Sciences.

[3]  Debora S. Marks,et al.  MicroRNA control of protein expression noise , 2015, Science.

[4]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[5]  Jie Wang,et al.  Single-Cell Co-expression Analysis Reveals Distinct Functional Modules, Co-regulation Mechanisms and Clinical Outcomes , 2016, PLoS Comput. Biol..

[6]  Claudia Angelini,et al.  Understanding gene regulatory mechanisms by integrating ChIP-seq and RNA-seq data: statistical solutions to biological problems , 2014, Front. Cell Dev. Biol..

[7]  Lin Song,et al.  Comparison of co-expression measures: mutual information, correlation, and model based indices , 2012, BMC Bioinformatics.

[8]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[9]  Ziv Bar-Joseph,et al.  Reconstructing dynamic microRNA-regulated interaction networks , 2013, Proceedings of the National Academy of Sciences.

[10]  Peter Bühlmann,et al.  Predicting causal effects in large-scale systems from observational data , 2010, Nature Methods.

[11]  Z. Bar-Joseph,et al.  Linking the signaling cascades and dynamic regulatory networks controlling stress responses , 2013, Genome research.

[12]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Terence P. Speed,et al.  Systematic noise degrades gene co-expression signals but can be corrected , 2015, BMC Bioinformatics.

[14]  Thalia E. Chan,et al.  Gene Regulatory Network Inference from Single-Cell Data Using Multivariate Information Measures , 2016, bioRxiv.

[15]  Hongzhe Li,et al.  A Markov random field model for network-based analysis of genomic data , 2007, Bioinform..

[16]  Henning Hermjakob,et al.  The Reactome pathway Knowledgebase , 2015, Nucleic acids research.

[17]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[18]  Alexander E. Kel,et al.  GTRD: a database of transcription factor binding sites identified by ChIP-seq experiments , 2016, Nucleic Acids Res..

[19]  Z. Bar-Joseph,et al.  Using neural networks for reducing the dimensions of single-cell RNA-Seq data , 2017, Nucleic acids research.

[20]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[21]  Harald Binder,et al.  Translating bioinformatics in oncology: guilt-by-profiling analysis and identification of KIF18B and CDCA3 as novel driver genes in carcinogenesis , 2015, Bioinform..

[22]  G. Crawford,et al.  DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. , 2010, Cold Spring Harbor protocols.

[23]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[24]  Attila Balint,et al.  Systematic analysis of complex genetic interactions , 2018, Science.

[25]  Doron Lancet,et al.  MalaCards: an amalgamated human disease compendium with diverse clinical and genetic annotation and structured search , 2016, Nucleic Acids Res..

[26]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[27]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[28]  Saurabh Sinha,et al.  On counting position weight matrix matches in a sequence, with application to discriminative motif finding , 2006, ISMB.

[29]  T. Mikkelsen,et al.  Dynamics of lineage commitment revealed by single-cell transcriptomics of differentiating embryonic stem cells , 2016, Nature Communications.

[30]  João Pedro de Magalhães,et al.  Gene co-expression analysis for functional classification and gene–disease predictions , 2017, Briefings Bioinform..

[31]  Jesse Gillis,et al.  Co-expression in Single-Cell Analysis: Saving Grace or Original Sin? , 2018, Trends in genetics : TIG.

[32]  Yonghua Zheng,et al.  The effect of siRNA-mediated lymphocyte-specific protein tyrosine kinase (Lck) inhibition on pulmonary inflammation in a mouse model of asthma. , 2015, International journal of clinical and experimental medicine.

[33]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[34]  Gabriele Sales,et al.  graphite - a Bioconductor package to convert pathway topology to gene network , 2012, BMC Bioinformatics.

[35]  Holger Karas,et al.  TRANSFAC: a database on transcription factors and their DNA binding sites , 1996, Nucleic Acids Res..

[36]  Shane J. Neph,et al.  A comparative encyclopedia of DNA elements in the mouse genome , 2014, Nature.

[37]  S. Oliver Proteomics: Guilt-by-association goes global , 2000, Nature.

[38]  Ziv Bar-Joseph,et al.  DREM 2.0: Improved reconstruction of dynamic regulatory networks from time-series expression data , 2012, BMC Systems Biology.

[39]  Ziv Bar-Joseph,et al.  A web server for comparative analysis of single-cell RNA-seq data , 2018, Nature Communications.