Predicting multicellular function through multi-layer tissue networks

Motivation: Understanding functions of proteins in specific human tissues is essential for insights into disease diagnostics and therapeutics, yet prediction of tissue‐specific cellular function remains a critical challenge for biomedicine. Results: Here, we present OhmNet, a hierarchy‐aware unsupervised node feature learning approach for multi‐layer networks. We build a multi‐layer network, where each layer represents molecular interactions in a different human tissue. OhmNet then automatically learns a mapping of proteins, represented as nodes, to a neural embedding‐based low‐dimensional space of features. OhmNet encourages sharing of similar features among proteins with similar network neighborhoods and among proteins activated in similar tissues. The algorithm generalizes prior work, which generally ignores relationships between tissues, by modeling tissue organization with a rich multiscale tissue hierarchy. We use OhmNet to study multicellular function in a multi‐layer protein interaction network of 107 human tissues. In 48 tissues with known tissue‐specific cellular functions, OhmNet provides more accurate predictions of cellular function than alternative approaches, and also generates more accurate hypotheses about tissue‐specific protein actions. We show that taking into account the tissue hierarchy leads to improved predictive power. Remarkably, we also demonstrate that it is possible to leverage the tissue hierarchy in order to effectively transfer cellular functions to a functionally uncharacterized tissue. Overall, OhmNet moves from flat networks to multiscale models able to predict a range of phenotypes spanning cellular subsystems. Availability and implementation: Source code and datasets are available at http://snap.stanford.edu/ohmnet. Contact: jure@cs.stanford.edu

[1]  A. Barabasi,et al.  Uncovering disease-disease relationships through the incomplete interactome , 2015, Science.

[2]  A. Barabasi,et al.  Tissue Specificity of Human Disease Module , 2016, Scientific Reports.

[3]  Rafael C. Jimenez,et al.  The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases , 2013, Nucleic Acids Res..

[4]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[5]  Xuelong Li,et al.  Joint Embedding Learning and Sparse Regression: A Framework for Unsupervised Feature Selection , 2014, IEEE Transactions on Cybernetics.

[6]  Adam P. Rosebrock,et al.  A global genetic interaction network maps a wiring diagram of cellular function , 2016, Science.

[7]  T. Ideker,et al.  Siri of the Cell: What Biology Could Learn from the iPhone , 2014, Cell.

[8]  Gary D. Bader,et al.  GeneMANIA Prediction Server 2013 Update , 2013, Nucleic Acids Res..

[9]  Natasa Przulj,et al.  Graphlet-based measures are suitable for biological network comparison , 2013, Bioinform..

[10]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[11]  Kara Dolinski,et al.  The BioGRID interaction database: 2015 update , 2014, Nucleic Acids Res..

[12]  Roded Sharan,et al.  Enhancing the Prioritization of Disease-Causing Genes through Tissue Specific Protein Interaction Networks , 2012, PLoS Comput. Biol..

[13]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[14]  Hans-Peter Kriegel,et al.  A Three-Way Model for Collective Learning on Multi-Relational Data , 2011, ICML.

[15]  Zhongfei Zhang,et al.  Dropout Training of Matrix Factorization and Autoencoder for Link Prediction in Sparse Graphs , 2015, SDM.

[16]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[17]  R. Medzhitov,et al.  Tissue-Specific Signals Control Reversible Program of Localization and Functional Polarization of Macrophages , 2014, Cell.

[18]  Sergio Gómez,et al.  Ranking in interconnected multilayer networks reveals versatile nodes , 2015, Nature Communications.

[19]  Bridget E. Begg,et al.  A Proteome-Scale Map of the Human Interactome Network , 2014, Cell.

[20]  Fang-Xiang Wu,et al.  Prediction of disease genes using tissue-specified gene-gene network , 2014, BMC Systems Biology.

[21]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[22]  Daniel S. Himmelstein,et al.  Understanding multicellular function and disease with human tissue-specific networks , 2015, Nature Genetics.

[23]  J. Nielsen,et al.  Analysis of the Human Tissue-specific Expression by Genome-wide Integration of Transcriptomics and Antibody-based Proteomics* , 2013, Molecular & Cellular Proteomics.

[24]  Robert E. Schapire,et al.  Hierarchical multi-label prediction of gene function , 2006, Bioinform..

[25]  Jun S. Liu,et al.  The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans , 2015, Science.

[26]  Marinka Zitnik,et al.  Data Fusion by Matrix Factorization , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  David Baltimore,et al.  Germline Transmission and Tissue-Specific Expression of Transgenes Delivered by Lentiviral Vectors , 2002, Science.

[28]  Huan Liu,et al.  Scalable Learning of Collective Behavior , 2012, IEEE Transactions on Knowledge and Data Engineering.

[29]  Vineet Bafna,et al.  Inferring gene ontologies from pairwise similarity data , 2014, Bioinform..

[30]  E. Birney,et al.  An integrated resource for genome-wide identification and analysis of human tissue-specific differentially methylated regions (tDMRs). , 2008, Genome research.

[31]  R. Sharan,et al.  Human protein interaction networks across tissues and diseases , 2015, Front. Genet..

[32]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[33]  T. Ideker,et al.  A gene ontology inferred from molecular networks , 2012, Nature Biotechnology.

[34]  Albert Solé-Ribalta,et al.  Navigability of interconnected networks under random failures , 2013, Proceedings of the National Academy of Sciences.

[35]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[36]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[37]  Catherine Daly,et al.  GeneTIER: prioritization of candidate disease genes using tissue-specific gene expression profiles , 2015, Bioinform..

[38]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[39]  Tomislav Smuc,et al.  Extensive complementarity between gene function prediction methods , 2016, Bioinform..

[40]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[41]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[42]  Michelangelo Ceci,et al.  Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction , 2013, BMC Bioinformatics.

[43]  S. Brunak,et al.  Network biology concepts in complex disease comorbidities , 2016, Nature Reviews Genetics.

[44]  Mason A. Porter,et al.  Author Correction: The physics of spreading processes in multilayer networks , 2016, 1604.02021.

[45]  Yukiko Matsuoka,et al.  Tissue-specific subnetworks and characteristics of publicly available human protein interaction databases , 2011, Bioinform..

[46]  T. Ideker,et al.  Translation of Genotype to Phenotype by a Hierarchy of Cell Subsystems , 2016, Cell systems.

[47]  Antje Chang,et al.  BRENDA in 2015: exciting developments in its 25th year of existence , 2014, Nucleic Acids Res..

[48]  Hui Li,et al.  A Deep Learning Approach to Link Prediction in Dynamic Networks , 2014, SDM.

[49]  Yuanfang Guan,et al.  Tissue-Specific Functional Networks for Prioritizing Phenotype and Disease Genes , 2012, PLoS Comput. Biol..

[50]  Quaid Morris,et al.  Using the Gene Ontology Hierarchy when Predicting Gene Function , 2009, UAI.

[51]  Carlo Vittorio Cannistraci,et al.  Minimum curvilinearity to enhance topological prediction of protein interactions by network embedding , 2013, Bioinform..

[52]  Hui Shen,et al.  Tissue-specific pathway association analysis using genome-wide association study summaries , 2017, Bioinform..

[53]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[54]  Igor Jurisica,et al.  Integrated interactions database: tissue-specific view of the human and model organism interactomes , 2015, Nucleic Acids Res..

[55]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[56]  Hans-Werner Mewes,et al.  CORUM: the comprehensive resource of mammalian protein complexes , 2007, Nucleic Acids Res..

[57]  David Warde-Farley,et al.  GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function , 2008, Genome Biology.