The SeqFEATURE library of 3D functional site models: comparison to existing methods and applications to protein function annotation

Structural genomics efforts have led to increasing numbers of novel, uncharacterized protein structures with low sequence identity to known proteins, resulting in a growing need for structure-based function recognition tools. Our method, SeqFEATURE, robustly models protein functions described by sequence motifs using a structural representation. We built a library of models that shows good performance compared to other methods. In particular, SeqFEATURE demonstrates significant improvement over other methods when sequence and structural similarity are low.

[1]  K Henrick,et al.  Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. , 2004, Acta crystallographica. Section D, Biological crystallography.

[2]  J. Skolnick,et al.  Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases. , 1998, Journal of molecular biology.

[3]  Amos Bairoch,et al.  The PROSITE database , 2005, Nucleic Acids Res..

[4]  J Skolnick,et al.  Functional analysis of the Escherichia coli genome using the sequence-to-structure-to-function paradigm: identification of proteins exhibiting the glutaredoxin/thioredoxin disulfide oxidoreductase activity. , 1998, Journal of molecular biology.

[5]  M. Gerstein,et al.  Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. , 2000, Journal of molecular biology.

[6]  S. Brenner A tour of structural genomics , 2001, Nature Reviews Genetics.

[7]  Patrice Koehl,et al.  The ASTRAL compendium for protein structure and sequence analysis , 2000, Nucleic Acids Res..

[8]  E. Webb Enzyme nomenclature 1992. Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the Nomenclature and Classification of Enzymes. , 1992 .

[9]  Alex Bateman,et al.  InterPro : An integrated documentation resource for protein families , domains and functional sites The InterPro Consortium : , 2005 .

[10]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[11]  Alex Bateman,et al.  HMM-based databases in InterPro , 2002, Briefings Bioinform..

[12]  John D. Westbrook,et al.  TargetDB: a target registration database for structural genomics projects , 2004, Bioinform..

[13]  P. Babbitt Definitions of enzyme function for the structural genomics era. , 2003, Current opinion in chemical biology.

[14]  R. Altman,et al.  Recognizing protein binding sites using statistical descriptions of their 3D environments. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[15]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[16]  Frances M. G. Pearl,et al.  The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution , 2006, Nucleic Acids Res..

[17]  Rolf Apweiler,et al.  InterProScan: protein domains identifier , 2005, Nucleic Acids Res..

[18]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[19]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[20]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[21]  Owen White,et al.  The TIGRFAMs database of protein families , 2003, Nucleic Acids Res..

[22]  Patricia C. Babbitt,et al.  Automated discovery of 3D motifs for protein function annotation , 2006, Bioinform..

[23]  D. Eisenberg,et al.  Inference of protein function from protein structure. , 2005, Structure.

[24]  M. Campbell,et al.  PANTHER: a library of protein families and subfamilies indexed by function. , 2003, Genome research.

[25]  Janet M. Thornton,et al.  ProFunc: a server for predicting protein function from 3D structure , 2005, Nucleic Acids Res..

[26]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[27]  Chris Sander,et al.  Dali/FSSP classification of three-dimensional protein folds , 1997, Nucleic Acids Res..

[28]  Russ B. Altman,et al.  Recognizing Complex, Asymmetric Functional Sites in Protein Structures Using a Bayesian Scoring Function , 2003, J. Bioinform. Comput. Biol..

[29]  Frances M. G. Pearl,et al.  Gene3D: structural assignment for whole genes and genomes using the CATH domain structure database. , 2002, Genome research.

[30]  Sean R. Eddy,et al.  Pfam: multiple sequence alignments and HMM-profiles of protein domains , 1998, Nucleic Acids Res..

[31]  Alex Bateman,et al.  InterPro: An Integrated Documentation Resource for Protein Families, Domains and Functional Sites , 2002, Briefings Bioinform..

[32]  Russ B. Altman,et al.  Automated Construction of Structural Motifs for Predicting Functional Sites on Protein Structures , 2003, Pacific Symposium on Biocomputing.

[33]  Janet M Thornton,et al.  Protein function prediction using local 3D templates. , 2005, Journal of molecular biology.

[34]  Janet M Thornton,et al.  Towards fully automated structure-based function prediction in structural genomics: a case study. , 2007, Journal of molecular biology.

[35]  Robert D. Finn,et al.  New developments in the InterPro database , 2007, Nucleic Acids Res..

[36]  Cyrus Chothia,et al.  The SUPERFAMILY database in 2004: additions and improvements , 2004, Nucleic Acids Res..