Analysis of human tissue-specific protein-protein interaction networks

Proteins are the core machinery of all living cells and protein interactions determine the inner workings of life itself. Insights into the nature of these interactions are important for learning about how and why cells work. The interactions between all proteins in a cell compose a so-called protein-protein interaction (PPI) network, in form of a graph. Not all proteins are present in all cell and tissue types, hence protein interactions are restricted to cell and tissue types where both interacting proteins exist. These tissue dependent interactions form tissue-specific PPI (TSPPI) networks. In this thesis, we construct and analyze TSPPI networks from different data sources. We follow the goal to gain insights into the structure of interactions as well as into the properties of specific groups of proteins inside the TSPPI networks. To that end, we implement an analysis pipeline and develop efficient analysis algorithms, which operate on our graph representation for TSPPI networks. Moreover, we study the basic properties of TSPPI networks and investigate properties of certain classes of proteins. Then, we provide a method to identify proteins which gain in importance by cellular specialization. Furthermore, we re-evaluate prior research results on a large set of TSPPIs and demonstrate that some previous conclusions have to be reconsidered. Finally, we employ clustering algorithms with the objective to identify tissue-specific functional modules within TSPPIs. In addition to using available clustering methods, we pursue two more approaches.

[1]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[2]  B. Bollobás The evolution of random graphs , 1984 .

[3]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[4]  Béla Bollobás,et al.  Random Graphs , 1985 .

[5]  H. Lodish Molecular Cell Biology , 1986 .

[6]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[7]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[8]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[9]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[10]  M. Newman,et al.  Random graphs with arbitrary degree distributions and their applications. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[12]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[13]  M. Gerstein,et al.  Comparing protein abundance and mRNA expression levels on a genomic scale , 2003, Genome Biology.

[14]  César A. Hidalgo,et al.  Scale-free networks , 2008, Scholarpedia.

[15]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[16]  B Marshall,et al.  Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource , 2004, Nucleic Acids Res..

[17]  Martin Vingron,et al.  IntAct: an open source molecular interaction database , 2004, Nucleic Acids Res..

[18]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Erik K. Malm,et al.  A Human Protein Atlas for Normal and Cancer Tissues Based on Antibody Proteomics* , 2005, Molecular & Cellular Proteomics.

[20]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[21]  I. Jurisica,et al.  Unequal evolutionary conservation of human protein interactions in interologous networks , 2007, Genome Biology.

[22]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[23]  Thomas Lengauer,et al.  A new measure for functional similarity of gene products based on Gene Ontology , 2006, BMC Bioinformatics.

[24]  Jun Yu,et al.  How many human genes can be defined as housekeeping with current expression data? , 2008, BMC Genomics.

[25]  Gabriele Ausiello,et al.  MINT: the Molecular INTeraction database , 2006, Nucleic Acids Res..

[26]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[28]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[29]  Peter Uetz,et al.  MPIDB: the microbial protein interaction database , 2008, Bioinform..

[30]  Matthew R. Laird,et al.  Protein Protein Interaction Network Evaluation for Identifying Potential Drug Targets , 2009 .

[31]  Ming-Jing Hwang,et al.  Topological and organizational properties of the products of house-keeping and tissue-specific genes in protein-protein interaction networks , 2009, BMC Systems Biology.

[32]  Karl Pearson F.R.S. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling , 2009 .

[33]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[34]  Ben Lehner,et al.  Tissue specificity and the human protein interaction network , 2009, Molecular systems biology.

[35]  A. Barabasi,et al.  An empirical framework for binary interactome mapping , 2008, Nature Methods.

[36]  Rachael P. Huntley,et al.  QuickGO: a web-based tool for Gene Ontology searching , 2009, Bioinform..

[37]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[38]  D. Greco,et al.  Relatedness of human tissues from gene expression meta-analysis , 2010 .

[39]  E. Lundberg,et al.  Towards a knowledge-based Human Protein Atlas , 2010, Nature Biotechnology.

[40]  Martin Löwer,et al.  Digital Genome-Wide ncRNA Expression, Including SnoRNAs, across 11 Human Tissues Using PolyA-Neutral Amplification , 2010, PloS one.

[41]  M. Albrecht,et al.  Tissue-specific proteins and functional implications. , 2011, Journal of proteome research.

[42]  Julie M. Sahalie,et al.  Supplementary Figure and Table Legends , 2022 .

[43]  Sylvie Ricard-Blum,et al.  MatrixDB, the extracellular matrix interaction database , 2010, Nucleic Acids Res..

[44]  David E Hill,et al.  next-generation sequencing to generate interactome datasets , 2011 .

[45]  Ian C. Hsu,et al.  Identification of Human Housekeeping Genes and Tissue-Selective Genes by Microarray Meta-Analysis , 2011, PloS one.

[46]  Gourab Ghoshal,et al.  Ranking stability and super-stable nodes in complex networks. , 2011, Nature communications.

[47]  Yukiko Matsuoka,et al.  Tissue-specific subnetworks and characteristics of publicly available human protein interaction databases , 2011, Bioinform..

[48]  Dorothea Emig,et al.  Measuring and analyzing tissue specificity of human genes and protein complexes , 2011, EURASIP J. Bioinform. Syst. Biol..

[49]  Arek Kasprzyk,et al.  BioMart: driving a paradigm change in biological data management , 2011, Database J. Biol. Databases Curation.

[50]  Gary D Bader,et al.  PSICQUIC and PSISCORE: accessing and scoring molecular interactions , 2011, Nature Methods.

[51]  陈奕欣 Ongoing and future developments at the Universal Protein Resource , 2011 .

[52]  Andrei L. Turinsky,et al.  A Census of Human Soluble Protein Complexes , 2012, Cell.

[53]  Ugur Sahin,et al.  RNA-Seq Atlas - a reference database for gene expression profiling in normal tissue by next-generation sequencing , 2012, Bioinform..

[54]  Marco Mina,et al.  FastSemSim: Fast SEMantic SIMilarity over Gene Ontology annotations. , 2012, ECCB 2012.

[55]  Martin H. Schaefer,et al.  HIPPIE: Integrating Protein Interaction Networks with Experiment Based Quality Scores , 2012, PloS one.

[56]  Johannes Goll,et al.  Protein interaction data curation: the International Molecular Exchange (IMEx) consortium , 2012, Nature Methods.

[57]  Gang Chen,et al.  Identifying functional modules in tissue specific protein interaction network , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops.

[58]  Damian Szklarczyk,et al.  STRING v9.1: protein-protein interaction networks, with increased coverage and integration , 2012, Nucleic Acids Res..

[59]  Maria Keays,et al.  ArrayExpress update—trends in database growth and links to data analysis tools , 2012, Nucleic Acids Res..

[60]  Elspeth A. Bruford,et al.  Genenames.org: the HGNC resources in 2013 , 2012, Nucleic Acids Res..

[61]  Ilan Y. Smoly,et al.  The TissueNet database of human tissue protein–protein interactions , 2012, Nucleic Acids Res..

[62]  Christian Staudt,et al.  NetworKit: An Interactive Tool Suite for High-Performance Network Analysis , 2014, ArXiv.

[63]  Elspeth A. Bruford,et al.  Genenames.org: the HGNC resources in 2015 , 2014, Nucleic Acids Res..