Function Prediction and Analysis of Mycobacterium tuberculosis Hypothetical Proteins

High-throughput biology technologies have yielded complete genome sequences and functional genomics data for several organisms, including crucial microbial pathogens of humans, animals and plants. However, up to 50% of genes within a genome are often labeled “unknown”, “uncharacterized” or “hypothetical”, limiting our understanding of virulence and pathogenicity of these organisms. Even though biological functions of proteins encoded by these genes are not known, many of them have been predicted to be involved in key processes in these organisms. In particular, for Mycobacterium tuberculosis, some of these “hypothetical” proteins, for example those belonging to the Pro-Glu or Pro-Pro-Glu (PE/PPE) family, have been suspected to play a crucial role in the intracellular lifestyle of this pathogen, and may contribute to its survival in different environments. We have generated a functional interaction network for Mycobacterium tuberculosis proteins and used this to predict functions for many of its hypothetical proteins. Here we performed functional enrichment analysis of these proteins based on their predicted biological functions to identify annotations that are statistically relevant, and analysed and compared network properties of hypothetical proteins to the known proteins. From the statistically significant annotations and network information, we have tried to derive biologically meaningful annotations related to infection and disease. This quantitative analysis provides an overview of the functional contributions of Mycobacterium tuberculosis “hypothetical” proteins to many basic cellular functions, including its adaptability in the host system and its ability to evade the host immune response.

[1]  Nicole A. Kruh,et al.  Proteomic Definition of the Cell Wall of Mycobacterium tuberculosis , 2010, Journal of proteome research.

[2]  María Martín,et al.  The Universal Protein Resource (UniProt) in 2010 , 2010 .

[3]  Baris E. Suzek,et al.  The Universal Protein Resource (UniProt) in 2010 , 2009, Nucleic Acids Res..

[4]  K. Shadan,et al.  Available online: , 2012 .

[5]  Dmitrij Frishman,et al.  DIMA 2.0—predicted and known domain interactions , 2008, Nucleic Acids Res..

[6]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..

[7]  J A Swets,et al.  Better decisions through science. , 2000, Scientific American.

[8]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[9]  Thomas R. Gruber,et al.  Toward principles for the design of ontologies used for knowledge sharing? , 1995, Int. J. Hum. Comput. Stud..

[10]  Rachael P. Huntley,et al.  The GOA database in 2009—an integrated Gene Ontology Annotation resource , 2008, Nucleic Acids Res..

[11]  Emily Dimmer,et al.  The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology , 2004, Nucleic Acids Res..

[12]  Jonathan Pevsner,et al.  Bioinformatics and functional genomics , 2003 .

[13]  E. Rubin,et al.  Genes required for mycobacterial growth defined by high density mutagenesis , 2003, Molecular microbiology.

[14]  A note on the determination of sample sizes for hypergeometric distributions , 1999 .

[15]  Emily Dimmer,et al.  An evaluation of GO annotation retrieval for BioCreAtIvE and GOA , 2005, BMC Bioinformatics.

[16]  E. Bradbury,et al.  Comprehensive Proteomic Profiling of the Membrane Constituents of a Mycobacterium tuberculosis Strain*S , 2003, Molecular & Cellular Proteomics.

[17]  Gaston K Mazandu,et al.  Contribution of microarray data to the advancement of knowledge on the Mycobacterium tuberculosis interactome: use of the random partial least squares approach. , 2011, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[18]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[19]  See-Kiong Ng,et al.  InterDom: a database of putative interacting protein domains for validating predicted protein interactions and complexes , 2003, Nucleic Acids Res..

[20]  Sayera Banu,et al.  Are the PE‐PGRS proteins of Mycobacterium tuberculosis variable surface antigens? , 2002, Molecular microbiology.

[21]  P. Brennan Structure, function, and biogenesis of the cell wall of Mycobacterium tuberculosis. , 2003, Tuberculosis.

[22]  Fredric C. Gey,et al.  The Relationship between Recall and Precision , 1994, J. Am. Soc. Inf. Sci..

[23]  David Martin,et al.  GOToolBox: functional analysis of gene datasets based on Gene Ontology , 2004, Genome Biology.

[24]  Huiru Zheng,et al.  An Integrative Bayesian Approach to Supporting the Prediction of Protein-Protein Interactions: A Case Study in Human Heart Failure , 2009 .

[25]  Carole A. Goble,et al.  Ontology-based Knowledge Representation for Bioinformatics , 2000, Briefings Bioinform..

[26]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[27]  C. Ouzounis,et al.  Recent developments and future directions in computational genomics , 2000, FEBS letters.

[28]  M. Brennan,et al.  Comparative Immune Response to PE and PE_PGRS Antigens of Mycobacterium tuberculosis , 2001, Infection and Immunity.

[29]  H. Nikaido,et al.  The envelope of mycobacteria. , 1995, Annual review of biochemistry.

[30]  Giorgio Valle,et al.  The Gene Ontology in 2010: extensions and refinements , 2009, Nucleic Acids Res..

[31]  Rachael P. Huntley,et al.  The Gene Ontology − Providing a Functional Role in Proteomic Studies , 2008 .

[32]  D. Barrell,et al.  The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro. , 2003, Genome research.

[33]  M. Brennan,et al.  PPE and PE_PGRS proteins of Mycobacterium marinum are transported via the type VII secretion system ESX‐5 , 2009, Molecular microbiology.

[34]  See-Kiong Ng,et al.  Integrative approach for computationally inferring protein domain interactions , 2003, SAC '03.

[35]  Peter B. McGarvey,et al.  Infrastructure for the life sciences: design and implementation of the UniProt website , 2009, BMC Bioinformatics.

[36]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[37]  Jean-Michel Claverie,et al.  Phydbac "Gene Function Predictor" : a gene annotation tool based on genomic context analysis , 2005, BMC Bioinformatics.

[38]  William R. Jacobs,et al.  Evidence that Mycobacterial PE_PGRS Proteins Are Cell Surface Constituents That Influence Interactions with Other Cells , 2001, Infection and Immunity.

[39]  Michael Grüninger,et al.  Ontologies for Integrating Engineering Applications , 2001, J. Comput. Inf. Sci. Eng..

[40]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2004, Nucleic Acids Res..

[41]  Michael Uschold,et al.  Ontologies and semantics for seamless connectivity , 2004, SGMD.

[42]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[43]  Nicola J. Mulder,et al.  A Topology-Based Metric for Measuring Term Similarity in the Gene Ontology , 2012, Adv. Bioinformatics.

[44]  Deming Zhao,et al.  Expression of PE_PGRS 62 protein in Mycobacterium smegmatis decrease mRNA expression of proinflammatory cytokines IL-1β, IL-6 in macrophages , 2010, Molecular and Cellular Biochemistry.

[45]  Ozlem Keskin,et al.  Topological properties of protein interaction networks from a structural perspective. , 2008, Biochemical Society transactions.

[46]  M. Vidal,et al.  Protein interaction mapping in C. elegans using proteins involved in vulval development. , 2000, Science.

[47]  Emily Dimmer,et al.  The Gene Ontology Annotation (GOA) Database - An integrated resource of GO annotations to the UniProt Knowledgebase , 2003, Silico Biol..

[48]  O Mason,et al.  Graph theory and networks in Biology. , 2006, IET systems biology.

[49]  N. Mulder,et al.  Using the underlying biological organization of the Mycobacterium tuberculosis functional network for protein function prediction. , 2012, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[50]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[51]  Christopher M. Sassetti,et al.  Genetic requirements for mycobacterial survival during infection , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[52]  Rolf Apweiler,et al.  The Integr8 project - a resource for genomic and proteomic data , 2004, Silico Biol..

[53]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[54]  P. Brennan,et al.  The cell-wall core of Mycobacterium tuberculosis in the context of drug discovery. , 2007, Current topics in medicinal chemistry.

[55]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[56]  Nicola J. Mulder,et al.  Scoring Protein Relationships in Functional Interaction Networks Predicted from Sequence Data , 2011, PloS one.

[57]  Nicola J. Mulder,et al.  Generation and Analysis of Large-Scale Data-Driven Mycobacterium tuberculosis Functional Networks for Drug Target Identification , 2011, Adv. Bioinformatics.