The FEATURE framework for protein function annotation: modeling new functions, improving performance, and extending to novel applications

Structural genomics efforts contribute new protein structures that often lack significant sequence and fold similarity to known proteins. Traditional sequence and structure-based methods may not be sufficient to annotate the molecular functions of these structures. Techniques that combine structural and functional modeling can be valuable for functional annotation. FEATURE is a flexible framework for modeling and recognition of functional sites in macromolecular structures. Here, we present an overview of the main components of the FEATURE framework, and describe the recent developments in its use. These include automating training sets selection to increase functional coverage, coupling FEATURE to structural diversity generating methods such as molecular dynamics simulations and loop modeling methods to improve performance, and using FEATURE in large-scale modeling and structure determination efforts.

[1]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[2]  Russ B. Altman,et al.  Characterizing Oriented Protein Structural Sites Using Biochemical Properties , 1995, ISMB.

[3]  M. Levitt,et al.  Energy functions that discriminate X-ray and near native folds from well-constructed decoys. , 1996, Journal of molecular biology.

[4]  S. Bagley,et al.  Conserved features in the active site of nonhomologous serine proteases. , 1996, Folding & design.

[5]  J. Thornton,et al.  Tess: A geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites , 1997, Protein science : a publication of the Protein Society.

[6]  E S Huang,et al.  Factors affecting the ability of energy functions to discriminate correct from incorrect folds. , 1997, Journal of molecular biology.

[7]  Gapped BLAST and PSI-BLAST: A new , 1997 .

[8]  T. A. Jones,et al.  Databases in protein crystallography. , 1998, Acta crystallographica. Section D, Biological crystallography.

[9]  J. Skolnick,et al.  Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases. , 1998, Journal of molecular biology.

[10]  R. Altman,et al.  Recognizing protein binding sites using statistical descriptions of their 3D environments. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[11]  E. Huang,et al.  Are predicted structures good enough to preserve functional sites? , 1999, Structure.

[12]  M. Gerstein,et al.  Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. , 2000, Journal of molecular biology.

[13]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[14]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[15]  Roman A. Laskowski,et al.  PDBsum: summaries and analyses of PDB structures , 2001, Nucleic Acids Res..

[16]  Jeffrey T. Chang,et al.  Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature. , 2002, Genome research.

[17]  Terri K. Attwood,et al.  PRINTS and PRINTS-S shed light on protein ancestry , 2002, Nucleic Acids Res..

[18]  Gerhard Klebe,et al.  Relibase: design and development of a database for comprehensive analysis of protein-ligand interactions. , 2003, Journal of molecular biology.

[19]  M. Jambon,et al.  A new bioinformatic approach to detect common 3D sites in protein structures , 2003, Proteins.

[20]  John B. O. Mitchell,et al.  Protein Ligand Database (PLD): additional understanding of the nature and specificity of protein-ligand complexes , 2003, Bioinform..

[21]  Russ B. Altman,et al.  WebFEATURE: an interactive web tool for identifying and visualizing functional sites on macromolecular structures , 2003, Nucleic Acids Res..

[22]  Aleksandr V. Smirnov,et al.  Watching a Protein as it Functions with 150-ps Time-Resolved X-ray Crystallography , 2003, Science.

[23]  Russ B. Altman,et al.  Recognizing Complex, Asymmetric Functional Sites in Protein Structures Using a Bayesian Scoring Function , 2003, J. Bioinform. Comput. Biol..

[24]  Russ B. Altman,et al.  Automated Construction of Structural Motifs for Predicting Functional Sites on Protein Structures , 2003, Pacific Symposium on Biocomputing.

[25]  Russ B Altman,et al.  Microenvironment analysis and identification of magnesium binding sites in RNA. , 2003, Nucleic acids research.

[26]  Jie Liang,et al.  CASTp: Computed Atlas of Surface Topography of proteins , 2003, Nucleic Acids Res..

[27]  M. Campbell,et al.  PANTHER: a library of protein families and subfamilies indexed by function. , 2003, Genome research.

[28]  John D. Westbrook,et al.  TargetDB: a target registration database for structural genomics projects , 2004, Bioinform..

[29]  Cyrus Chothia,et al.  The SUPERFAMILY database in 2004: additions and improvements , 2004, Nucleic Acids Res..

[30]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[31]  Gerard J Kleywegt,et al.  Evaluation of protein fold comparison servers , 2003, Proteins.

[32]  J. Thornton,et al.  Predicting protein function from sequence and structural data. , 2005, Current opinion in structural biology.

[33]  T. Ando,et al.  FCANAL: Structure based protein function prediction method. Application to enzymes and binding proteins , 2005 .

[34]  Rolf Apweiler,et al.  InterProScan: protein domains identifier , 2005, Nucleic Acids Res..

[35]  Doo-Ho Cho,et al.  PDB-Ligand: a ligand database based on PDB for the automated and customized classification of ligand-binding structures , 2005, Nucleic Acids Res..

[36]  Janet M. Thornton,et al.  ProFunc: a server for predicting protein function from 3D structure , 2005, Nucleic Acids Res..

[37]  D. Eisenberg,et al.  Inference of protein function from protein structure. , 2005, Structure.

[38]  Bin Zheng,et al.  Identifying biological concepts from a protein-related corpus with a probabilistic topic model , 2006, BMC Bioinformatics.

[39]  Itay Mayrose,et al.  ConSurf 2005: the projection of evolutionary conservation scores of residues on protein structures , 2005, Nucleic Acids Res..

[40]  David E. Kim,et al.  Free modeling with Rosetta in CASP6 , 2005, Proteins.

[41]  Christine A. Orengo,et al.  Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint , 2007, BMC Bioinformatics.

[42]  Adrian H Elcock,et al.  Computational sampling of a cryptic drug binding site in a protein receptor: explicit solvent molecular dynamics and inhibitor docking to p38 MAP kinase. , 2006, Journal of molecular biology.

[43]  Steven E Brenner,et al.  The Impact of Structural Genomics: Expectations and Outcomes , 2005, Science.

[44]  Amos Bairoch,et al.  The PROSITE database , 2005, Nucleic Acids Res..

[45]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[46]  Anton Yuryev,et al.  Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks , 2007, BMC Bioinformatics.

[47]  Janet M Thornton,et al.  Towards fully automated structure-based function prediction in structural genomics: a case study. , 2007, Journal of molecular biology.

[48]  D. Kern,et al.  Dynamic personalities of proteins , 2007, Nature.

[49]  Ruben Abagyan,et al.  PIER: Protein interface recognition for structural proteomics , 2007, Proteins.

[50]  Michael Levitt,et al.  Growth of novel protein structural data , 2007, Proceedings of the National Academy of Sciences.

[51]  Giovanni De Micheli,et al.  Clustering protein environments for function prediction: finding PROSITE motifs in 3D , 2007, BMC Bioinformatics.

[52]  Frank K. Pettit,et al.  HotPatch: a statistical approach to finding biologically relevant features on protein surfaces. , 2007, Journal of molecular biology.

[53]  Lars Malmström,et al.  Structure prediction for CASP7 targets using extensive all‐atom refinement with Rosetta@home , 2007, Proteins.

[54]  P. Radivojac,et al.  Evaluation of features for catalytic residue prediction in novel folds , 2007 .

[55]  V. Helms,et al.  Transient pockets on protein surfaces involved in protein-protein interaction. , 2007, Journal of medicinal chemistry.

[56]  M. Brunori,et al.  Time-resolved methods in biophysics. 6. Time-resolved Laue crystallography as a tool to investigate photo-activated protein dynamics , 2007, Photochemical & photobiological sciences : Official journal of the European Photochemistry Association and the European Society for Photobiology.

[57]  Russ B Altman,et al.  The SeqFEATURE library of 3D functional site models: comparison to existing methods and applications to protein function annotation , 2008, Genome Biology.

[58]  Russ B. Altman,et al.  Combining Molecular Dynamics and Machine Learning to Improve Protein Function Recognition , 2007, Pacific Symposium on Biocomputing.

[59]  Jean-Claude Latombe,et al.  Efficient Algorithms to Explore Conformation Spaces of Flexible Protein Loops , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[60]  Charles L. Brooks,et al.  Prediction of protein loop conformations using multiscale modeling methods with physical energy scoring functions , 2008, J. Comput. Chem..

[61]  Jessica C. Ebert,et al.  Robust recognition of zinc binding sites in proteins , 2007, Protein science : a publication of the Protein Society.