Structure-Guided Comparative Analysis of Proteins: Principles, Tools, and Applications for Predicting Function

The main objective of this article was to define a ten-step procedure, largely guided by the percent-identity scale, that can be followed as a general rule for functional inference of an uncharacterized protein. This procedure is by no means exhaustive but can be used as an initial process for functional assignment. In many cases, additional clues and complementary information may be obtained from pathway analysis, operon information, and other non-homology based methods. We have demonstrated how by following the ten steps a function could be assigned for an uncharacterized conserved protein with its related sequences. In addition, the goal was to provide an overview of the available tools and databases to carry out comparative sequence and structural analysis.

[1]  Michael J. E. Sternberg,et al.  ConFunc - functional annotation in the twilight zone , 2008, Bioinform..

[2]  Michael Y. Galperin The Molecular Biology Database Collection: 2008 update , 2007, Nucleic Acids Res..

[3]  Christian J. A. Sigrist,et al.  Nucleic Acids Research Advance Access published November 14, 2007 The 20 years of PROSITE , 2007 .

[4]  Tim J. P. Hubbard,et al.  Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..

[5]  B. Moore,et al.  Discovery and characterization of a marine bacterial SAM-dependent chlorinase. , 2008, Nature chemical biology.

[6]  Xiaofeng Zhu,et al.  Mechanism of enzymatic fluorination in Streptomyces cattleya. , 2007, Journal of the American Chemical Society.

[7]  Ramanathan Sowdhamini,et al.  IWS: Integrated web server for protein sequence and structure analysis , 2007, Bioinformation.

[8]  Zongchao Jia,et al.  Piecing together the structure–function puzzle: Experiences in structure‐based functional annotation of hypothetical proteins , 2007, Proteomics.

[9]  Janet M Thornton,et al.  Towards fully automated structure-based function prediction in structural genomics: a case study. , 2007, Journal of molecular biology.

[10]  Frances M. G. Pearl,et al.  The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution , 2006, Nucleic Acids Res..

[11]  Iddo Friedberg,et al.  Automated protein function predictionçthe genomic challenge , 2006 .

[12]  Daisuke Kihara,et al.  Enhanced automated function prediction using distantly related sequences and contextual association by PFP , 2006, Protein science : a publication of the Protein Society.

[13]  J. Naismith,et al.  The fluorinase from Streptomyces cattleya is also a chlorinase. , 2005, Angewandte Chemie.

[14]  Cathy H. Wu,et al.  Large‐scale, classification‐driven, rule‐based functional annotation of proteins , 2005 .

[15]  Janet M. Thornton,et al.  ProFunc: a server for predicting protein function from 3D structure , 2005, Nucleic Acids Res..

[16]  Jaime Prilusky,et al.  SPACE: a suite of tools for protein structure prediction and analysis based on complementarity and environment , 2005, Nucleic Acids Res..

[17]  Janet M. Thornton,et al.  PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids , 2004, Nucleic Acids Res..

[18]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2004, Nucleic Acids Res..

[19]  D. O'Hagan,et al.  Fluorometabolite biosynthesis and the fluorinase from Streptomyces cattleya. , 2004, Natural product reports.

[20]  J. Naismith,et al.  Crystal structure and mechanism of a bacterial fluorinating enzyme , 2004, Nature.

[21]  Robert S. Ledley,et al.  PIRSF: family classification system at the Protein Information Resource , 2004, Nucleic Acids Res..

[22]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[23]  Jodie J. Yin,et al.  A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes , 2004, Genome Biology.

[24]  Søren Brunak,et al.  Functionality of system components: conservation of protein function in protein feature space. , 2003, Genome research.

[25]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[26]  J. Whisstock,et al.  Prediction of protein function from protein sequence and structure , 2003, Quarterly Reviews of Biophysics.

[27]  Annabel E. Todd,et al.  Target Selection and Determination of Function in Structural Genomics , 2003, IUBMB life.

[28]  Janet M Thornton,et al.  Inferring protein function from structure. , 2003, Methods of biochemical analysis.

[29]  John F Hunt,et al.  The crystal structure of MT0146/CbiT suggests that the putative precorrin-8w decarboxylase is a methyltransferase. , 2002, Structure.

[30]  S. Blair Hedges,et al.  The origin and evolution of model organisms , 2002, Nature Reviews Genetics.

[31]  B. Rost Enzyme function less conserved than anticipated. , 2002, Journal of molecular biology.

[32]  J. Hamilton,et al.  Biochemistry: Biosynthesis of an organofluorine molecule , 2002, Nature.

[33]  Annabel E. Todd,et al.  Evolution of function in protein superfamilies, from a structural perspective. , 2001, Journal of molecular biology.

[34]  Annabel E. Todd,et al.  From structure to function: Approaches and limitations , 2000, Nature Structural Biology.

[35]  T L Blundell,et al.  Structural genomics: an overview. , 2000, Progress in biophysics and molecular biology.

[36]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[37]  S. Brenner Errors in genome annotation. , 1999, Trends in genetics : TIG.

[38]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[39]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[40]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[41]  B. Rost,et al.  Protein structures sustain evolutionary drift. , 1997, Folding & design.

[42]  Chris Sander,et al.  Dali/FSSP classification of three-dimensional protein folds , 1997, Nucleic Acids Res..

[43]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[44]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[45]  Dayhoff Mo,et al.  The origin and evolution of protein superfamilies. , 1976 .

[46]  M. O. Dayhoff,et al.  The origin and evolution of protein superfamilies. , 1976, Federation proceedings.