Organization and function acquisition in protein-protein interaction networks

Protein-protein interaction (PPI) networks enable the transmission of biological information throughout cells, allowing cells to respond to environmental stimuli. PPI networks can be represented as graphs, and graph analysis techniques have been applied in order to determine the topological roles played by individual proteins in PPI network structure. However, more complex analysis is needed to study the functional organization of PPI networks. In addition, the proteins that make up PPI networks change and evolve new functions over time. In the first part of this thesis, we introduce a metric, functional insularity, to measure the degree to which proteins physically interact with functionally related proteins. Proteins in PPI networks exhibit significant variation in insularity values, suggesting the presence of a tradeoff between network modularity and connectivity. Low-insularity proteins—those that interact with many functionally unrelated proteins—are more crucial than high-insularity proteins to maintaining network connectivity, are less likely to be essential, and have more regulators. Furthermore, we show that between-species homologs tend to have similar levels of functional insularity. Low-insularity proteins are found between topological network modules as well as within them. We find that functional and topological network modules contain proteins with a range of insularity values, including low-insularity proteins that might may function as “interfaces” to other modules. Finally, we show how functional insularity analysis can be applied to improve network clustering analyses. In the second part of this thesis, we study the acquisition of new functions by proteins and their integration into the PPI network. We first use a maximum parsimonybased approach to infer the ages of human proteins. We then determine various function-related traits for each age group, such as protein-protein interaction count, expression ubiquity, and number of unique domains. We find that young proteins in human have fewer protein-protein interactions, have fewer unique domains, are iii expressed in fewer tissues, and are less likely to be essential than older proteins. In addition, we find that proteins tend to physically interact mainly with other proteins of similar age. Finally, we find that younger pairs of paralogs are more coexpressed and share more common regulators than older pairs. In sum, this thesis advances our understanding of PPI networks by showing that the dual requirements of modularity and connectivity are balanced using “connector” proteins and “module” proteins, which have distinct biological traits, and by uncovering differences between young and old proteins that suggest that proteins gain functions and integrate into networks over time.

[1]  G. Wagner,et al.  The road to modularity , 2007, Nature Reviews Genetics.

[2]  S. Fields,et al.  A novel genetic system to detect protein–protein interactions , 1989, Nature.

[3]  Katherine S. Pollard,et al.  ProteinHistorian: Tools for the Comparative Analysis of Eukaryote Protein Origin , 2012, PLoS Comput. Biol..

[4]  David Z. Chen,et al.  Architecture of the human regulatory network derived from ENCODE data , 2012, Nature.

[5]  E Ruoslahti,et al.  Integrin signaling. , 1999, Science.

[6]  Pooja Jain,et al.  The YEASTRACT database: a tool for the analysis of transcription regulatory associations in Saccharomyces cerevisiae , 2005, Nucleic Acids Res..

[7]  Judith A. Blake,et al.  The Mouse Genome Database (MGD): mouse biology and model systems , 2007, Nucleic Acids Res..

[8]  F. B. Pickett,et al.  Splitting pairs: the diverging fates of duplicated genes , 2002, Nature Reviews Genetics.

[9]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[10]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[11]  Mark Gerstein,et al.  The Importance of Bottlenecks in Protein Networks: Correlation with Gene Essentiality and Expression Dynamics , 2007, PLoS Comput. Biol..

[12]  M. Lynch,et al.  The structure and early evolution of recently arisen gene duplicates in the Caenorhabditis elegans genome. , 2003, Genetics.

[13]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[14]  Haiyuan Yu,et al.  HINT: High-quality protein interactomes and their applications in understanding human disease , 2012, BMC Systems Biology.

[15]  A. E. Hirsh,et al.  Evolutionary Rate in the Protein Interaction Network , 2002, Science.

[16]  Kevin R. Thornton,et al.  The origin of new genes: glimpses from the young and old , 2003, Nature Reviews Genetics.

[17]  Mona Singh,et al.  Novel genes exhibit distinct patterns of function acquisition and network integration , 2010, Genome Biology.

[18]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[19]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[20]  S. vanDongen Graph Clustering by Flow Simulation , 2000 .

[21]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Wen-Hsiung Li,et al.  Evolution of the yeast protein interaction network , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Mira V. Han,et al.  Adaptive evolution of young gene duplicates in mammals. , 2009, Genome research.

[24]  Lan V. Zhang,et al.  Evidence for dynamically organized modularity in the yeast protein–protein interaction network , 2004, Nature.

[25]  Kesheng Liu,et al.  Information Flow Analysis of Interactome Networks , 2009, PLoS Comput. Biol..

[26]  Peng Jiang,et al.  SPICi: a fast clustering algorithm for large biological networks , 2010, Bioinform..

[27]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[28]  Athanasia C. Tzika,et al.  Historical Constraints on Vertebrate Genome Evolution , 2009, Genome biology and evolution.

[29]  A. Orth,et al.  Large-scale analysis of the human and mouse transcriptomes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[30]  L. Freeman,et al.  Centrality in valued graphs: A measure of betweenness based on network flow , 1991 .

[31]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[32]  Kara Dolinski,et al.  The Princeton Protein Orthology Database (P-POD): A Comparative Genomics Analysis Tool for Biologists , 2007, PloS one.

[33]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[34]  L. Armengol,et al.  Origin of primate orphan genes: a comparative genomics approach. , 2008, Molecular biology and evolution.

[35]  A. Hughes The evolution of functionally novel proteins after gene duplication , 1994, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[36]  Zsuzsanna Dosztányi,et al.  IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content , 2005, Bioinform..

[37]  Rong Chen,et al.  Similarly Strong Purifying Selection Acts on Human Disease Genes of All Evolutionary Ages , 2009, Genome biology and evolution.

[38]  Christie S. Chang,et al.  The BioGRID interaction database: 2013 update , 2012, Nucleic Acids Res..

[39]  Yun Ding,et al.  On the origin of new genes in Drosophila. , 2008, Genome research.

[40]  David Botstein,et al.  GO: : TermFinder--open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes , 2004, Bioinform..

[41]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[42]  A. Force,et al.  The probability of duplicate gene preservation by subfunctionalization. , 2000, Genetics.

[43]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[44]  Dannie Durand,et al.  NOTUNG: A Program for Dating Gene Duplications and Optimizing Gene Family Trees , 2000, J. Comput. Biol..

[45]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[46]  B. Snel,et al.  Toward Automatic Reconstruction of a Highly Resolved Tree of Life , 2006, Science.

[47]  Wen-Hsiung Li,et al.  Protein function, connectivity, and duplicability in yeast. , 2006, Molecular biology and evolution.

[48]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[49]  Kenneth H. Wolfe,et al.  Turning a hobby into a job: How duplicated genes find new functions , 2008, Nature Reviews Genetics.

[50]  Ananth Grama,et al.  Functional characterization and topological modularity of molecular interaction networks , 2010, BMC Bioinformatics.

[51]  Huifeng Jiang,et al.  De Novo Origination of a New Protein-Coding Gene in Saccharomyces cerevisiae , 2008, Genetics.

[52]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[53]  Steven Maere,et al.  The gain and loss of genes during 600 million years of vertebrate evolution , 2006, Genome Biology.

[54]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[55]  Adam Eyre-Walker,et al.  The Accumulation of Gene Regulation Through Time , 2011, Genome biology and evolution.

[56]  Dianne P. O'Leary,et al.  Why Do Hubs in the Yeast Protein Interaction Network Tend To Be Essential: Reexamining the Connection between the Network Topology and Essentiality , 2008, PLoS Comput. Biol..

[57]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2011 update , 2010, Nucleic Acids Res..

[58]  A. Wagner The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. , 2001, Molecular biology and evolution.

[59]  Alexandre P. Francisco,et al.  YEASTRACT-DISCOVERER: new tools to improve the analysis of transcriptional regulatory associations in Saccharomyces cerevisiae , 2007, Nucleic Acids Res..

[60]  J. Farris Phylogenetic Analysis Under Dollo's Law , 1977 .

[61]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[62]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[63]  K. Komurov,et al.  Revealing static and dynamic modular architecture of the eukaryotic protein interaction network , 2007, Molecular Systems Biology.

[64]  B. Séraphin,et al.  The tandem affinity purification (TAP) method: a general procedure of protein complex purification. , 2001, Methods.

[65]  Mona Singh,et al.  How and when should interactome-derived clusters be used to predict functional modules and protein function? , 2009, Bioinform..

[66]  R. Guimerà,et al.  Functional cartography of complex metabolic networks , 2005, Nature.

[67]  Jim Leebens-Mack,et al.  Expression pattern shifts following duplication indicative of subfunctionalization and neofunctionalization in regulatory genes of Arabidopsis. , 2006, Molecular biology and evolution.

[68]  Andrew D Kern,et al.  Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[69]  Gang Liu,et al.  Automatic clustering of orthologs and inparalogs shared by multiple proteomes , 2006, ISMB.

[70]  Joel Dudley,et al.  TimeTree: a public knowledge-base of divergence times among organisms , 2006, Bioinform..

[71]  Alexander Rives,et al.  Modular organization of cellular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[72]  Jianzhi Zhang Evolution by gene duplication: an update , 2003 .

[73]  W. J. Quesne The Uniquely Evolved Character Concept and its Cladistic Application , 1974 .

[74]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[75]  J. Hopfield,et al.  From molecular to modular cell biology , 1999, Nature.

[76]  Matthew A. Hibbs,et al.  Finding function: evaluation methods for functional genomic data , 2006, BMC Genomics.

[77]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[78]  M. Babu,et al.  The rules of disorder or why disorder rules. , 2009, Progress in biophysics and molecular biology.

[79]  Edith D. Wong,et al.  Saccharomyces Genome Database: the genomics resource of budding yeast , 2011, Nucleic Acids Res..

[80]  F. Kondrashov,et al.  The evolution of gene duplications: classifying and distinguishing between models , 2010, Nature Reviews Genetics.

[81]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[82]  P. Tompa,et al.  The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. , 2005, Journal of molecular biology.

[83]  Dr. Susumu Ohno Evolution by Gene Duplication , 1970, Springer Berlin Heidelberg.