1,000 structures and more from the MCSG

BackgroundThe Midwest Center for Structural Genomics (MCSG) is one of the large-scale centres of the Protein Structure Initiative (PSI). During the first two phases of the PSI the MCSG has solved over a thousand protein structures. A criticism of structural genomics is that target selection strategies mean that some structures are solved without having a known function and thus are of little biomedical significance. Structures of unknown function have stimulated the development of methods for function prediction from structure.ResultsWe show that the MCSG has met the stated goals of the PSI and use online resources and readily available function prediction methods to provide functional annotations for more than 90% of the MCSG structures. The structure-to-function prediction method ProFunc provides likely functions for many of the MCSG structures that cannot be annotated by sequence-based methods.ConclusionsAlthough the focus of the PSI was structural coverage, many of the structures solved by the MCSG can also be associated with functional classes and biological roles of possible biomedical value.

[1]  Ines Thiele,et al.  Three-Dimensional Structural View of the Central Metabolic Network of Thermotoga maritima , 2009, Science.

[2]  A Rod Merrill,et al.  Structure-function analysis of water-soluble inhibitors of the catalytic domain of exotoxin A from Pseudomonas aeruginosa. , 2005, Biochemical Journal.

[3]  David A. Lee,et al.  Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space , 2006, Nucleic acids research.

[4]  Andrei L Lomize,et al.  Bmc Structural Biology , 2022 .

[5]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[6]  Frances M. G. Pearl,et al.  CATHEDRAL: A Fast and Effective Algorithm to Predict Folds and Domain Boundaries from Multidomain Protein Structures , 2007, PLoS Comput. Biol..

[7]  Martin Madera,et al.  Profile Comparer: a program for scoring and aligning profile hidden Markov models , 2008, Bioinform..

[8]  Burkhard Rost,et al.  Structural genomics is the largest contributor of novel structural leverage , 2009, Journal of Structural and Functional Genomics.

[9]  M. Campbell,et al.  PANTHER: a library of protein families and subfamilies indexed by function. , 2003, Genome research.

[10]  Janet M. Thornton,et al.  ProFunc: a server for predicting protein function from 3D structure , 2005, Nucleic Acids Res..

[11]  Jinfeng Liu,et al.  Novel leverage of structural genomics , 2007, Nature Biotechnology.

[12]  Sameer Velankar,et al.  PDBe: Protein Data Bank in Europe , 2009, Nucleic Acids Res..

[13]  Russell L. Marsden,et al.  Progress of structural genomics initiatives: an analysis of solved target structures. , 2005, Journal of molecular biology.

[14]  Peer Bork,et al.  SMART 6: recent updates and new developments , 2008, Nucleic Acids Res..

[15]  Baris E. Suzek,et al.  The Universal Protein Resource (UniProt) in 2010 , 2009, Nucleic Acids Res..

[16]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[17]  John D. Westbrook,et al.  The protein structure initiative structural genomics knowledgebase , 2008, Nucleic Acids Res..

[18]  P. E. Granum,et al.  Bacillus cereus and its food poisoning toxins. , 1997, FEMS microbiology letters.

[19]  María Martín,et al.  The Universal Protein Resource (UniProt) in 2010 , 2010 .

[20]  Jeremy M Berg,et al.  Update on the protein structure initiative. , 2007, Structure.

[21]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[22]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[23]  Amos Bairoch,et al.  PROSITE: A Documented Database Using Patterns and Profiles as Motif Descriptors , 2002, Briefings Bioinform..

[24]  Frances M. G. Pearl,et al.  The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution , 2006, Nucleic Acids Res..

[25]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[26]  Janet M. Thornton,et al.  An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis , 2003, Bioinform..

[27]  Danielle D. Visschedyk,et al.  Yeast as a tool for characterizing mono-ADP-ribosyltransferase toxins. , 2009, FEMS microbiology letters.

[28]  Ian Sillitoe,et al.  FLORA: A Novel Method to Predict Protein Function from Structure in Diverse Superfamilies , 2009, PLoS Comput. Biol..

[29]  X. Estivill,et al.  Mutation in TRMU related to transfer RNA modification modulates the phenotypic expression of the deafness-associated mitochondrial 12S ribosomal RNA mutations. , 2006, American journal of human genetics.

[30]  Steven E Brenner,et al.  The Impact of Structural Genomics: Expectations and Outcomes , 2005, Science.

[31]  John D. Westbrook,et al.  The Protein Model Portal , 2008, Journal of Structural and Functional Genomics.

[32]  S. Brenner,et al.  Implications of structural genomics target selection strategies: Pfam5000, whole genome, and random approaches , 2004, Proteins.

[33]  W R Taylor,et al.  SSAP: sequential structure alignment program for protein structure comparison. , 1996, Methods in enzymology.

[34]  Amos Bairoch,et al.  The ENZYME database in 2000 , 2000, Nucleic Acids Res..

[35]  Susumu Goto,et al.  KEGG for representation and analysis of molecular networks involving diseases and drugs , 2009, Nucleic Acids Res..

[36]  David A. Lee,et al.  PSI-2: structural genomics to cover protein domain family space. , 2009, Structure.

[37]  J M Thornton,et al.  Sequences annotated by structure: a tool to facilitate the use of structural information in sequence analysis. , 1998, Protein engineering.

[38]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[39]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[40]  Michelle G. Giglio,et al.  TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes , 2006, Nucleic Acids Res..

[41]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2004, Nucleic Acids Res..

[42]  Christine A. Orengo,et al.  Gene3D: merging structure and function for a Thousand genomes , 2009, Nucleic Acids Res..

[43]  Rachel Kolodny,et al.  Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. , 2005, Journal of molecular biology.

[44]  Hiroshi Mori,et al.  Comparative Metagenomics Revealed Commonly Enriched Gene Sets in Human Gut Microbiomes , 2007, DNA research : an international journal for rapid publication of reports on genes and genomes.

[45]  Michael Levitt,et al.  Growth of novel protein structural data , 2007, Proceedings of the National Academy of Sciences.

[46]  Janet M Thornton,et al.  Protein function prediction using local 3D templates. , 2005, Journal of molecular biology.

[47]  Janet M Thornton,et al.  Towards fully automated structure-based function prediction in structural genomics: a case study. , 2007, Journal of molecular biology.

[48]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[50]  Roman A. Laskowski,et al.  PDBsum new things , 2008, Nucleic Acids Res..

[51]  Philip E. Bourne,et al.  Functional Coverage of the Human Genome by Existing Structures, Structural Genomics Targets, and Homology Models , 2005, PLoS Comput. Biol..