New in protein structure and function annotation: hotspots, single nucleotide polymorphisms and the 'Deep Web'.

The rapidly increasing quantity of protein sequence data continues to widen the gap between available sequences and annotations. Comparative modeling suggests some aspects of the 3D structures of approximately half of all known proteins; homology- and network-based inferences annotate some aspect of function for a similar fraction of the proteome. For most known protein sequences, however, there is detailed knowledge about neither their function nor their structure. Comprehensive efforts towards the expert curation of sequence annotations have failed to meet the demand of the rapidly increasing number of available sequences. Only the automated prediction of protein function in the absence of homology can close the gap between available sequences and annotations in the foreseeable future. This review focuses on two novel methods for automated annotation, and briefly presents an outlook on how modern web software may revolutionize the field of protein sequence annotation. First, predictions of protein binding sites and functional hotspots, and the evolution of these into the most successful type of prediction of protein function from sequence will be discussed. Second, a new tool, comprehensive in silico mutagenesis, which contributes important novel predictions of function and at the same time prepares for the onset of the next sequencing revolution, will be described. While these two new sub-fields of protein prediction represent the breakthroughs that have been achieved methodologically, it will then be argued that a different development might further change the way biomedical researchers benefit from annotations: modern web software can connect the worldwide web in any browser with the 'Deep Web' (ie, proprietary data resources). The availability of this direct connection, and the resulting access to a wealth of data, may impact drug discovery and development more than any existing method that contributes to protein annotation.

[1]  Burkhard Rost,et al.  SNAP predicts effect of mutations on protein function , 2008, Bioinform..

[2]  Heng-Da Cheng,et al.  Discrimination of disease-related non-synonymous single nucleotide polymorphisms using multi-scale RBF kernel fuzzy support vector machine , 2009, Pattern Recognit. Lett..

[3]  M. Schroeder,et al.  Using protein binding site prediction to improve protein docking. , 2008, Gene.

[4]  J. Janin,et al.  A dissection of specific and non-specific protein-protein interfaces. , 2004, Journal of molecular biology.

[5]  Christine A. Orengo,et al.  FFPred: an integrated feature-based function prediction server for vertebrate proteomes , 2008, Nucleic Acids Res..

[6]  Ozlem Keskin,et al.  HotSprint: database of computational hot spots in protein interfaces , 2007, Nucleic Acids Res..

[7]  Rajesh Nair,et al.  Predicting Protein Subcellular Localization Using Intelligent Systems , 2006 .

[8]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[9]  A. Valencia,et al.  In silico two‐hybrid system for the selection of physically interacting protein pairs , 2002, Proteins.

[10]  Kurt S. Thorn,et al.  ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions , 2001, Bioinform..

[11]  Ruth Nussinov,et al.  MultiBind and MAPPIS: webservers for multiple alignment of protein 3D-binding sites and their interactions , 2008, Nucleic Acids Res..

[12]  N. Ben-Tal,et al.  ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. , 2001, Journal of molecular biology.

[13]  Alfonso Valencia,et al.  iHOP web services , 2007, Nucleic Acids Res..

[14]  Xiao Sun,et al.  Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature , 2008, Bioinform..

[15]  R. Abagyan,et al.  Identification of protein-protein interaction sites from docking energy landscapes. , 2004, Journal of molecular biology.

[16]  Jaime Prilusky,et al.  Proteopedia - a scientific 'wiki' bridging the rift between three-dimensional structure and function of biomacromolecules , 2008, Genome Biology.

[17]  Dagmar Ringe,et al.  Prediction of interaction sites from apo 3D structures when the holo conformation is different , 2008, Proteins.

[18]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[19]  Burkhard Rost,et al.  MetalDetector: a web server for predicting metal-binding sites and disulfide bridges in proteins from sequence , 2008, Bioinform..

[20]  Lydia E. Kavraki,et al.  Prediction of enzyme function based on 3D templates of evolutionarily important amino acids , 2008, BMC Bioinformatics.

[21]  R. Raz,et al.  ProMate: a structure based prediction program to identify the location of protein-protein binding sites. , 2004, Journal of molecular biology.

[22]  Markus Meuwly,et al.  Importance of individual side chains for the stability of a protein fold: Computational alanine scanning of the insulin monomer , 2006, J. Comput. Chem..

[23]  Egon L. Willighagen,et al.  Userscripts for the Life Sciences , 2007, BMC Bioinformatics.

[24]  L. Serrano,et al.  Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. , 2002, Journal of molecular biology.

[25]  Yanzhi Guo,et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.

[26]  S. Jones,et al.  Prediction of protein-protein interaction sites using patch analysis. , 1997, Journal of molecular biology.

[27]  A. Fersht,et al.  Protein-protein recognition: crystal structural analysis of a barnase-barstar complex at 2.0-A resolution. , 1994, Biochemistry.

[28]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[29]  Burkhard Rost,et al.  Online tools for predicting integral membrane proteins. , 2009, Methods in molecular biology.

[30]  Olivier Lichtarge,et al.  BIOINFORMATICS ORIGINAL PAPER Systems biology , 2004 .

[31]  Alexandre M J J Bonvin,et al.  How proteins get in touch: interface prediction in the study of biomolecular complexes. , 2008, Current protein & peptide science.

[32]  P. Kersey,et al.  In Silico Characterization of Proteins: UniProt, InterPro and Integr8 , 2008, Molecular biotechnology.

[33]  K. Li,et al.  Incorporating the amino acid properties to predict the significance of missense mutations , 2008, Amino Acids.

[34]  David W Ritchie,et al.  Recent progress and future directions in protein-protein docking. , 2008, Current protein & peptide science.

[35]  M. Barenboim,et al.  Statistical geometry based prediction of nonsynonymous SNP functional effects using random forest and neuro‐fuzzy classifiers , 2008, Proteins.

[36]  P. Radivojac,et al.  An integrated approach to inferring gene–disease associations in humans , 2008, Proteins.

[37]  Burkhard Rost,et al.  Protein subcellular localization prediction using artificial intelligence technology. , 2008, Methods in molecular biology.

[38]  Trupti Joshi,et al.  GeneFAS: A tool for prediction of gene function using multiple sources of data. , 2008, Methods in molecular biology.

[39]  Rachael P. Huntley,et al.  The Gene Ontology Annotation (GOA) Database , 2009 .

[40]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[41]  Sean Bechhofer,et al.  Ontology Driven Dynamic Linking of Biology Resources , 2005, Pacific Symposium on Biocomputing.

[42]  David A. Lee,et al.  Predicting protein function from sequence and structure , 2007, Nature Reviews Molecular Cell Biology.

[43]  G. Montelione,et al.  Contributions to the NIH-NIGMS Protein Structure Initiative from the PSI Production Centers. , 2008, Structure.

[44]  K. Georgeson,et al.  Molecular Determinants of Human Melanocortin-4 Receptor Responsible for Antagonist SHU9119 Selective Activity* , 2002, The Journal of Biological Chemistry.

[45]  C. Axel Innis,et al.  siteFiNDER|3D: a web-based tool for predicting the location of functional sites in proteins , 2007, Nucleic Acids Res..

[46]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[47]  S. Jones,et al.  Principles of protein-protein interactions. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[48]  P. Bork,et al.  Human non-synonymous SNPs: server and survey. , 2002, Nucleic acids research.

[49]  Frank K. Pettit,et al.  HotPatch: a statistical approach to finding biologically relevant features on protein surfaces. , 2007, Journal of molecular biology.

[50]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2006, Nucleic Acids Research.

[51]  Burkhard Rost,et al.  The PredictProtein server , 2003, Nucleic Acids Res..

[52]  Burkhard Rost,et al.  Prediction of DNA-binding residues from sequence , 2007, ISMB/ECCB.

[53]  L. Vassilev Small-Molecule Antagonists of p53-MDM2 Binding: Research Tools and Potential Therapeutics , 2004, Cell cycle.

[54]  V. Sobolev,et al.  Prediction of transition metal‐binding sites from apo protein structures , 2007, Proteins.

[55]  Benoit H. Dessailly,et al.  Exploring the structure and function paradigm. , 2008, Current opinion in structural biology.

[56]  Julie C. Mitchell,et al.  KFC Server: interactive forecasting of protein interaction hot spots , 2008, Nucleic Acids Res..

[57]  Dietrich Rebholz-Schuhmann,et al.  Text processing through Web services: calling Whatizit , 2008, Bioinform..

[58]  S. Bouvier,et al.  Systematic mutation of bacteriophage T4 lysozyme. , 1991, Journal of molecular biology.

[59]  David Baker,et al.  Macromolecular modeling with rosetta. , 2008, Annual review of biochemistry.

[60]  Burkhard Rost,et al.  Comprehensive in silico mutagenesis highlights functionally important residues in proteins , 2008, ECCB.

[61]  Leena Peltonen,et al.  The federated database – a basis for biobank-based post-genome studies, integrating phenome and genome data from 600 000 twin pairs in Europe , 2007, European Journal of Human Genetics.

[62]  B. Rost,et al.  Analysing six types of protein-protein interfaces. , 2003, Journal of molecular biology.

[63]  R. Kini,et al.  Prediction of potential protein‐protein interaction sites from amino acid sequence , 1996, FEBS letters.

[64]  B. Rost,et al.  Automatic prediction of protein function , 2003, Cellular and Molecular Life Sciences CMLS.

[65]  Pietro Liò,et al.  Prediction by Graph Theoretic Measures of Structural Effects in Proteins Arising from Non-Synonymous Single Nucleotide Polymorphisms , 2008, PLoS Comput. Biol..

[66]  J. Skolnick,et al.  A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation , 2008, Proceedings of the National Academy of Sciences.

[67]  Y. Gondo Trends in large-scale mouse mutagenesis: from genetics to functional genomics , 2008, Nature Reviews Genetics.

[68]  A. Bogan,et al.  Anatomy of hot spots in protein interfaces. , 1998, Journal of molecular biology.

[69]  D. Kihara,et al.  PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data , 2009, Proteins.

[70]  R. Cone,et al.  Targeted Disruption of the Melanocortin-4 Receptor Results in Obesity in Mice , 1997, Cell.

[71]  A. Bulpitt,et al.  Insights into protein-protein interfaces using a Bayesian network prediction method. , 2006, Journal of molecular biology.

[72]  M. Swindells,et al.  Protein clefts in molecular recognition and function. , 1996, Protein science : a publication of the Protein Society.

[73]  Janet M. Thornton,et al.  Understanding the molecular machinery of genetics through 3D structures , 2008, Nature Reviews Genetics.

[74]  B. Honig,et al.  Structural genomics: Computational methods for structure analysis , 2003, Protein science : a publication of the Protein Society.

[75]  J. Wells,et al.  Systematic mutational analyses of protein-protein interfaces. , 1991, Methods in enzymology.

[76]  P. Bork,et al.  Molecular eco-systems biology: towards an understanding of community function , 2008, Nature Reviews Microbiology.

[77]  Marco Punta,et al.  Beyond annotation transfer by homology: novel protein-function prediction methods to assist drug discovery. , 2005, Drug discovery today.

[78]  Ozlem Keskin,et al.  Characterization and prediction of protein interfaces to infer protein-protein interaction networks. , 2008, Current pharmaceutical biotechnology.

[79]  Guangpu Li,et al.  Structural basis of Rab5-Rabaptin5 interaction in endocytosis , 2004, Nature Structural &Molecular Biology.

[80]  Christopher L. McClendon,et al.  Reaching for high-hanging fruit in drug discovery at protein–protein interfaces , 2007, Nature.

[81]  The UniProt Consortium,et al.  The Universal Protein Resource (UniProt) 2009 , 2008, Nucleic Acids Res..

[82]  Pietro Liò,et al.  Current trends in the bioinformatic sequence analysis of metabolic pathways in prokaryotes , 2007, Briefings Bioinform..

[83]  Janet M. Thornton,et al.  From protein structure to biochemical function? , 2004, Journal of Structural and Functional Genomics.

[84]  D. Koller,et al.  From signatures to models: understanding cancer using microarrays , 2005, Nature Genetics.

[85]  L. Grivell,et al.  Text mining for biology - the way forward: opinions from leading scientists , 2008, Genome Biology.

[86]  Burkhard Rost,et al.  Structural genomics is the largest contributor of novel structural leverage , 2009, Journal of Structural and Functional Genomics.

[87]  Marco Punta,et al.  Structural genomics reveals EVE as a new ASCH/PUA‐related domain , 2009, Proteins.

[88]  Johannes Söding,et al.  Prediction of protein functional residues from sequence by probability density estimation , 2008, Bioinform..

[89]  Christian von Mering,et al.  STITCH: interaction networks of chemicals and proteins , 2007, Nucleic Acids Res..

[90]  Yang Zhang,et al.  Template‐based modeling and free modeling by I‐TASSER in CASP7 , 2007, Proteins.

[91]  K. Komurov,et al.  Revealing static and dynamic modular architecture of the eukaryotic protein interaction network , 2007, Molecular Systems Biology.

[92]  Evelyn Camon,et al.  Methods for gene ontology annotation. , 2007, Methods in molecular biology.

[93]  Anna Tramontano,et al.  Critical assessment of methods of protein structure prediction—Round VII , 2007, Proteins.

[94]  Michael Y. Galperin,et al.  Nucleic Acids Research annual Database Issue and the NAR online Molecular Biology Database Collection in 2009 , 2008, Nucleic Acids Res..

[95]  Frank Seeber,et al.  Patent searches as a complement to literature searches in the life sciences—a 'how-to' tutorial , 2007, Nature Protocols.

[96]  Nanjiang Shu,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm618 Sequence analysis Prediction of zinc-binding sites in proteins from sequence , 2008 .

[97]  Jeremy M Berg,et al.  Update on the protein structure initiative. , 2007, Structure.

[98]  P. Harbury,et al.  Design of protein-ligand binding based on the molecular-mechanics energy model. , 2008, Journal of molecular biology.

[99]  K. Fidelis,et al.  Protein structure prediction and model quality assessment. , 2009, Drug discovery today.

[100]  D. Parks,et al.  Benzodiazepinedione inhibitors of the Hdm2:p53 complex suppress human tumor cell proliferation in vitro and sensitize tumors to doxorubicin in vivo , 2006, Molecular Cancer Therapeutics.

[101]  C Cruz,et al.  Genetic studies of the lac repressor. XIV. Analysis of 4000 altered Escherichia coli lac repressors reveals essential and non-essential residues, as well as "spacers" which do not require a specific sequence. , 1994, Journal of molecular biology.

[102]  Elisabeth L. Humphris,et al.  Prediction of protein-protein interface sequence diversity using flexible backbone computational protein design. , 2008, Structure.

[103]  Lawrence Hunter,et al.  Improving protein function prediction methods with integrated literature data , 2008, BMC Bioinformatics.

[104]  M. Helmer-Citterich,et al.  Structure-based function prediction: approaches and applications. , 2008, Briefings in functional genomics & proteomics.

[105]  Ruben Abagyan,et al.  PIER: Protein interface recognition for structural proteomics , 2007, Proteins.

[106]  Burkhard Rost,et al.  The PredictProtein server , 2003, Nucleic Acids Res..

[107]  Darby Tien-Hao Chang,et al.  E1DS: catalytic site prediction based on 1D signatures of concurrent conservation , 2008, Nucleic Acids Res..

[108]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[109]  Serafim Batzoglou,et al.  Genetic and Computational Identification of a Conserved Bacterial Metabolic Module , 2008, PLoS genetics.

[110]  Burkhard Rost,et al.  Protein–Protein Interaction Hotspots Carved into Sequences , 2007, PLoS Comput. Biol..

[111]  Tatiana A. Tatusova,et al.  NCBI Reference Sequences: current status, policy and new initiatives , 2008, Nucleic Acids Res..

[112]  Marco Punta,et al.  The Rough Guide to In Silico Function Prediction, or How To Use Sequence and Structure Information To Predict Protein Function , 2008, PLoS Comput. Biol..

[113]  I-Min A. Chen,et al.  The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata , 2007, Nucleic Acids Res..

[114]  Janet M Thornton,et al.  Towards fully automated structure-based function prediction in structural genomics: a case study. , 2007, Journal of molecular biology.

[115]  S. Henikoff,et al.  Predicting the effects of amino acid substitutions on protein function. , 2006, Annual review of genomics and human genetics.

[116]  S C E Tosatto,et al.  Large-scale prediction of protein structure and function from sequence. , 2006, Current pharmaceutical design.

[117]  J. Rudolph Inhibiting transient protein–protein interactions: lessons from the Cdc25 protein tyrosine phosphatases , 2007, Nature Reviews Cancer.

[118]  Rachael P. Huntley,et al.  The GOA database in 2009—an integrated Gene Ontology Annotation resource , 2008, Nucleic Acids Res..

[119]  Marie-Claude Blatter,et al.  Protein variety and functional diversity: Swiss-Prot annotation in its biological context. , 2005, Comptes rendus biologies.

[120]  Janet M. Thornton,et al.  Detection of 3D atomic similarities and their use in the discrimination of small molecule protein-binding sites , 2008, ECCB.