Structural Annotation of the Mycobacterium tuberculosis Proteome

Of the ∼4000 ORFs identified through the genome sequence of Mycobacterium tuberculosis (TB) H37Rv, experimentally determined structures are available for 312. Since knowledge of protein structures is essential to obtain a high-resolution understanding of the underlying biology, we seek to obtain a structural annotation for the genome, using computational methods. Structural models were obtained and validated for ∼2877 ORFs, covering ∼70% of the genome. Functional annotation of each protein was based on fold-based functional assignments and a novel binding site based ligand association. New algorithms for binding site detection and genome scale binding site comparison at the structural level, recently reported from the laboratory, were utilized. Besides these, the annotation covers detection of various sequence and sub-structural motifs and quaternary structure predictions based on the corresponding templates. The study provides an opportunity to obtain a global perspective of the fold distribution in the genome. The annotation indicates that cellular metabolism can be achieved with only 219 folds. New insights about the folds that predominate in the genome, as well as the fold-combinations that make up multi-domain proteins are also obtained. 1728 binding pockets have been associated with ligands through binding site identification and sub-structure similarity analyses. The resource (http://proline.physics.iisc.ernet.in/Tbstructuralannotation), being one of the first to be based on structure-derived functional annotations at a genome scale, is expected to be useful for better understanding of TB and for application in drug discovery. The reported annotation pipeline is fairly generic and can be applied to other genomes as well.

[1]  M. Steinmetz,et al.  Structural basis for the specific inhibition of protein kinase G, a virulence factor of Mycobacterium tuberculosis , 2007, Proceedings of the National Academy of Sciences.

[2]  J. Norton,et al.  Identification of a nitroimidazo-oxazine-specific protein involved in PA-824 resistance in Mycobacterium tuberculosis. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[3]  R C Wade,et al.  'Flu' and structure-based drug design. , 1997, Structure.

[4]  Narayanaswamy Srinivasan,et al.  Nucleic Acids Research Advance Access published June 21, 2007 PIC: Protein Interactions Calculator , 2007 .

[5]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[6]  M. Daffé,et al.  The Acyl-AMP Ligase FadD32 and AccD4-containing Acyl-CoA Carboxylase Are Required for the Synthesis of Mycolic Acids and Essential for Mycobacterial Growth , 2005, Journal of Biological Chemistry.

[7]  Robert D. Finn,et al.  InterPro in 2011: new developments in the family and domain prediction database , 2011, Nucleic acids research.

[8]  Janet M. Thornton,et al.  ProFunc: a server for predicting protein function from 3D structure , 2005, Nucleic Acids Res..

[9]  Stefan Niemann,et al.  MIRU-VNTRplus: a web tool for polyphasic genotyping of Mycobacterium tuberculosis complex bacteria , 2010, Nucleic Acids Res..

[10]  M. Schroeder,et al.  LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation , 2006, BMC Structural Biology.

[11]  Rajesh S. Gokhale,et al.  Enzymic activation and transfer of fatty acids as acyl-adenylates in mycobacteria , 2004, Nature.

[12]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[13]  E. Baker Structural genomics as an approach towards understanding the biology of tuberculosis , 2007, Journal of Structural and Functional Genomics.

[14]  I. Kumagai,et al.  Functional conversion of the homologous proteins alpha-lactalbumin and lysozyme by exon exchange. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[15]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[16]  N. Chandra,et al.  Structural biology of Mycobacterium tuberculosis proteins: the Indian efforts. , 2011, Tuberculosis.

[17]  M. Vijayan,et al.  Unique features of the structure and interactions of mycobacterial uracil-DNA glycosylase: structure of a complex of the Mycobacterium tuberculosis enzyme in comparison with those from other sources. , 2008, Acta crystallographica. Section D, Biological crystallography.

[18]  T. Yeates,et al.  Verification of protein structures: Patterns of nonbonded atomic interactions , 1993, Protein science : a publication of the Protein Society.

[19]  J. Sacchettini,et al.  Structural genomics approach to drug discovery for Mycobacterium tuberculosis. , 2009, Current opinion in microbiology.

[20]  Anthony Maxwell,et al.  A Fluoroquinolone Resistance Protein from Mycobacterium tuberculosis That Mimics DNA , 2005, Science.

[21]  Cyrus Chothia,et al.  Protein Family Expansions and Biological Complexity , 2006, PLoS Comput. Biol..

[22]  D. Eisenberg,et al.  Inference of protein function from protein structure. , 2005, Structure.

[23]  W. Saenger,et al.  Structure of the Tet repressor-tetracycline complex and regulation of antibiotic resistance. , 1994, Science.

[24]  Ian Sillitoe,et al.  Extending CATH: increasing coverage of the protein structure universe and linking structure with function , 2010, Nucleic Acids Res..

[25]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Janet M. Thornton,et al.  An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis , 2003, Bioinform..

[27]  Kalidas Yeturu,et al.  PocketMatch: A new algorithm to compare binding sites in protein structures , 2008, BMC Bioinformatics.

[28]  C. Orengo,et al.  One fold with many functions: the evolutionary relationships between TIM barrel families based on their sequences, structures and functions. , 2002, Journal of molecular biology.

[29]  Marcos Catanho,et al.  GenoMycDB: a database for comparative analysis of mycobacterial genes and genomes. , 2006, Genetics and molecular research : GMR.

[30]  J. Rose,et al.  The 2.0 Å structure of human ferrochelatase, the terminal enzyme of heme biosynthesis , 2001, Nature Structural Biology.

[31]  Sameer Velankar,et al.  The role of structural bioinformatics resources in the era of integrative structural biology , 2013, Acta crystallographica. Section D, Biological crystallography.

[32]  R Sánchez,et al.  Advances in comparative protein-structure modelling. , 1997, Current opinion in structural biology.

[33]  Adamandia Kapopoulou,et al.  TubercuList--10 years after. , 2011, Tuberculosis.

[34]  R. Sankaranarayanan,et al.  A novel tunnel in mycobacterial type III polyketide synthase reveals the structural basis for generating diverse metabolites , 2004, Nature Structural &Molecular Biology.

[35]  Peter J. Stuckey,et al.  MUSTANG-MR Structural Sieving Server: Applications in Protein Structural Analysis and Crystallography , 2010, PloS one.

[36]  Roland L. Dunbrack Sequence comparison and protein structure prediction. , 2006, Current opinion in structural biology.

[37]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[38]  S. Brenner A tour of structural genomics , 2001, Nature Reviews Genetics.

[39]  J. Alonso,et al.  A toxin-antitoxin module as a target for antimicrobial development. , 2010, Plasmid.

[40]  Priyanka Verma,et al.  Mechanistic and functional insights into fatty acid activation in Mycobacterium tuberculosis , 2009, Nature chemical biology.

[41]  Ashutosh Kumar,et al.  Ligand based virtual screening and biological evaluation of inhibitors of chorismate mutase (Rv1885c) from Mycobacterium tuberculosis H37Rv. , 2007, Bioorganic & medicinal chemistry letters.

[42]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[43]  Douglas C. Rees,et al.  Crystallographic Studies of the Escherichia coliQuinol-Fumarate Reductase with Inhibitors Bound to the Quinol-binding Site* , 2002, The Journal of Biological Chemistry.

[44]  Zixin Deng,et al.  TADB: a web-based resource for Type 2 toxin–antitoxin loci in bacteria and archaea , 2010, Nucleic Acids Res..

[45]  James C Sacchettini,et al.  Crystal Structures of Mycolic Acid Cyclopropane Synthases fromMycobacterium tuberculosis * , 2002, The Journal of Biological Chemistry.

[46]  Jingchu Luo,et al.  Secreted protein prediction system combining CJ-SPHMM, TMHMM, and PSORT , 2003, Mammalian Genome.

[47]  T. Mizuno Two-Component Phosphorelay Signal Transduction Systems in Plants: from Hormone Responses to Circadian Rhythms , 2005, Bioscience, biotechnology, and biochemistry.

[48]  T. Schwede,et al.  Protein structure homology modeling using SWISS-MODEL workspace , 2008, Nature Protocols.

[49]  Kalidas Yeturu,et al.  Structural bioinformatics: Deriving biological insights from protein structures , 2010, Interdisciplinary Sciences: Computational Life Sciences.

[50]  R. Laskowski SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. , 1995, Journal of molecular graphics.

[51]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[52]  C. Supuran,et al.  Inhibitors of HIV-1 protease: current state of the art 10 years after their introduction. From antiretroviral drugs to antifungal, antibacterial and antitumor agents based on aspartic protease inhibitors. , 2007, Current medicinal chemistry.

[53]  Adam Liwo,et al.  Recent improvements in prediction of protein structure by global optimization of a potential energy function , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[54]  L. Swenson,et al.  Structure of the CoA transferase from pig heart to 1.7 A resolution. , 2004, Acta crystallographica. Section D, Biological crystallography.

[55]  Jinn-Moon Yang,et al.  Kappa-alpha plot derived structural alphabet and BLOSUM-like substitution matrix for rapid search of protein structure database , 2007, Genome Biology.

[56]  M. Glickman,et al.  A novel mycolic acid cyclopropane synthetase is required for cording, persistence, and virulence of Mycobacterium tuberculosis. , 2000, Molecular cell.

[57]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[58]  A. Sali,et al.  Statistical potential for assessment and prediction of protein structures , 2006, Protein science : a publication of the Protein Society.

[59]  Philip E. Bourne,et al.  Drug Discovery Using Chemical Systems Biology: Repositioning the Safe Medicine Comtan to Treat Multi-Drug and Extensively Drug Resistant Tuberculosis , 2009, PLoS Comput. Biol..

[60]  S. Balaji,et al.  PALI: a database of alignments and phylogeny of homologous protein structures , 2001, Bioinform..

[61]  Philip E. Bourne,et al.  The Mycobacterium tuberculosis Drugome and Its Polypharmacological Implications , 2010, PLoS Comput. Biol..

[62]  Burkhard Rost,et al.  Evaluation of template‐based models in CASP8 with standard measures , 2009, Proteins.

[63]  Anna Tramontano,et al.  Evaluation of CASP8 model quality predictions , 2009, Proteins.

[64]  Kalidas Yeturu,et al.  targetTB: A target identification pipeline for Mycobacterium tuberculosis through an interactome, reactome and genome-scale structural analysis , 2008, BMC Systems Biology.

[65]  S. Gellman,et al.  Targeting protein-protein interactions: lessons from p53/MDM2. , 2007, Biopolymers.

[66]  O. El-Kabbani,et al.  Structure of monkey dimeric dihydrodiol dehydrogenase in complex with isoascorbic acid. , 2008, Acta crystallographica. Section D, Biological crystallography.

[67]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[68]  Matthew W Vetting,et al.  Mycobacterium tuberculosis dihydrofolate reductase is a target for isoniazid , 2006, Nature Structural &Molecular Biology.

[69]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[70]  B. Britigan,et al.  Intraphagosomal Mycobacterium tuberculosis Acquires Iron from Both Extracellular Transferrin and Intracellular Iron Pools , 2002, The Journal of Biological Chemistry.

[71]  L. Mourey,et al.  The dual function of the Mycobacterium tuberculosis FadD32 required for mycolic acid biosynthesis. , 2009, Chemistry & biology.

[72]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[73]  Rebecca Page,et al.  Protein biophysical properties that correlate with crystallization success in Thermotoga maritima: maximum clustering strategy for structural genomics. , 2004, Journal of molecular biology.

[74]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[75]  Stephen K. Burley,et al.  An overview of structural genomics , 2000, Nature Structural Biology.

[76]  A. Elofsson,et al.  Can correct protein models be identified? , 2003, Protein science : a publication of the Protein Society.

[77]  A. Sali,et al.  A composite score for predicting errors in protein structure models , 2006, Protein science : a publication of the Protein Society.

[78]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[79]  C. Orengo,et al.  Protein function annotation by homology-based inference , 2009, Genome Biology.

[80]  Z. Lou,et al.  Protein targets for structure-based anti-Mycobacterium tuberculosis drug discovery , 2010, Protein & Cell.

[81]  Claudine Médigue,et al.  Re-annotation of the genome sequence of Mycobacterium tuberculosis H37Rv. , 2002, Microbiology.

[82]  D. Eisenberg,et al.  Structure and Proposed Activity of a Member of the VapBC Family of Toxin-Antitoxin Systems , 2009, Journal of Biological Chemistry.

[83]  A. Sali,et al.  Statistical potentials for fold assessment , 2009 .

[84]  Nagasuma Chandra,et al.  PocketDepth: a new depth based algorithm for identification of ligand binding sites in proteins. , 2008, Journal of structural biology.

[85]  P. Leadlay,et al.  How coenzyme B12 radicals are generated: the crystal structure of methylmalonyl-coenzyme A mutase at 2 A resolution. , 1996, Structure.

[86]  Matthew P Jacobson,et al.  Assessment of protein structure refinement in CASP9 , 2011, Proteins.

[87]  Elena Papaleo,et al.  Validation of protein models by a neural network approach , 2008, BMC Bioinformatics.

[88]  D. Chatterjee The mycobacterial cell wall: structure, biosynthesis and sites of drug action. , 1997, Current opinion in chemical biology.

[89]  J. Thornton,et al.  PROCHECK: a program to check the stereochemical quality of protein structures , 1993 .

[90]  Martin Phillips,et al.  Toward the structural genomics of complexes: crystal structure of a PE/PPE protein complex from Mycobacterium tuberculosis. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[91]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[92]  H. Dailey,et al.  Identification of [2Fe-2S] Clusters in Microbial Ferrochelatases , 2002, Journal of bacteriology.

[93]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[94]  M. James,et al.  Structures of Mycobacterium tuberculosispyridoxine 5'-phosphate oxidase and its complexes with flavin mononucleotide and pyridoxal 5'-phosphate. , 2005, Acta crystallographica. Section D, Biological crystallography.

[95]  S E Brenner,et al.  Distribution of protein folds in the three superkingdoms of life. , 1999, Genome research.

[96]  Sujata Sharma,et al.  Structure of isocitrate lyase, a persistence factor of Mycobacterium tuberculosis , 2000, Nature Structural Biology.

[97]  J. Thornton,et al.  AQUA and PROCHECK-NMR: Programs for checking the quality of protein structures solved by NMR , 1996, Journal of biomolecular NMR.

[98]  Chris Sander,et al.  Touring protein fold space with Dali/FSSP , 1998, Nucleic Acids Res..

[99]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[100]  A Wlodawer,et al.  Inhibitors of HIV-1 protease: a major success of structure-assisted drug design. , 1998, Annual review of biophysics and biomolecular structure.

[101]  C. Grimaldi,et al.  A polyketide synthase catalyzes the last condensation step of mycolic acid biosynthesis in mycobacteria and related organisms , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[102]  B. Barrell,et al.  Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence , 1998, Nature.

[103]  J. Briggs,et al.  The 1.9 A crystal structure of alanine racemase from Mycobacterium tuberculosis contains a conserved entryway into the active site. , 2005, Biochemistry.

[104]  Catherine L. Worth,et al.  Structural biology and bioinformatics in drug design: opportunities and challenges for target identification and lead discovery , 2006, Philosophical Transactions of the Royal Society B: Biological Sciences.

[105]  J. Padiadpu,et al.  Network approaches to drug discovery , 2013, Expert opinion on drug discovery.

[106]  M. Gerstein,et al.  Protein family and fold occurrence in genomes: power-law behaviour and evolutionary model. , 2001, Journal of molecular biology.

[107]  Jens Preben Morth,et al.  Two-component systems of Mycobacterium tuberculosis: structure-based approaches. , 2007, Methods in enzymology.

[108]  M. Lundrigan,et al.  Participation of fad and mbt Genes in Synthesis of Mycobactin in Mycobacterium smegmatis , 2004, Journal of bacteriology.

[109]  Nagasuma R. Chandra,et al.  Flux Balance Analysis of Mycolic Acid Pathway: Targets for Anti-Tubercular Drugs , 2005, PLoS Comput. Biol..

[110]  Mark von Itzstein,et al.  The war against influenza: discovery and development of sialidase inhibitors. , 2007, Nature reviews. Drug discovery.

[111]  M. Glickman,et al.  Redundant Function of cmaA2 and mmaA2 in Mycobacterium tuberculosis cis Cyclopropanation of Oxygenated Mycolates , 2010, Journal of bacteriology.

[112]  D. Eisenberg,et al.  The TB Structural Genomics Consortium: a decade of progress. , 2011, Tuberculosis.

[113]  Haruki Nakamura,et al.  The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data , 2006, Nucleic Acids Res..

[114]  B. Shoichet,et al.  Fragment-guided design of subnanomolar β-lactamase inhibitors active in vivo , 2012, Proceedings of the National Academy of Sciences.

[115]  J. Thornton,et al.  PQS: a protein quaternary structure file server. , 1998, Trends in biochemical sciences.

[116]  Chris Abell,et al.  Integrated biophysical approach to fragment screening and validation for fragment-based lead discovery , 2013, Proceedings of the National Academy of Sciences.

[117]  Christian Stolte,et al.  TB database: an integrated platform for tuberculosis research , 2008, Nucleic Acids Res..

[118]  Narayanan Eswar,et al.  Protein structure modeling with MODELLER. , 2008, Methods in molecular biology.

[119]  S. Teichmann,et al.  Supra-domains: evolutionary units larger than single protein domains. , 2004, Journal of molecular biology.

[120]  Liam J. McGuffin,et al.  Improvement of the GenTHREADER Method for Genomic Fold Recognition , 2003, Bioinform..

[121]  Matteo Pellegrini,et al.  Prolinks: a database of protein functional linkages derived from coevolution , 2004, Genome Biology.

[122]  S. Connell,et al.  Ribosomal Protection Proteins and Their Mechanism of Tetracycline Resistance , 2003, Antimicrobial Agents and Chemotherapy.

[123]  Dan S. Tawfik,et al.  What makes a protein fold amenable to functional innovation? Fold polarity and stability trade-offs. , 2013, Journal of molecular biology.

[124]  T. Blundell,et al.  Structural investigation of inhibitor designs targeting 3-dehydroquinate dehydratase from the shikimate pathway of Mycobacterium tuberculosis. , 2011, The Biochemical journal.

[125]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[126]  M. Krátký,et al.  Advances in mycobacterial isocitrate lyase targeting and inhibitors. , 2012, Current medicinal chemistry.

[127]  Kalidas Yeturu,et al.  PocketAlign A Novel Algorithm for Aligning Binding Sites in Protein Structures , 2011, J. Chem. Inf. Model..

[128]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[129]  Marc A. Martí-Renom,et al.  MODBASE: a database of annotated comparative protein structure models and associated resources , 2005, Nucleic Acids Res..

[130]  M. Sternberg,et al.  Protein structure prediction on the Web: a case study using the Phyre server , 2009, Nature Protocols.