Predicting protein function from sequence and structure

While the number of sequenced genomes continues to grow, experimentally verified functional annotation of whole genomes remains patchy. Structural genomics projects are yielding many protein structures that have unknown function. Nevertheless, subsequent experimental investigation is costly and time-consuming, which makes computational methods for predicting protein function very attractive. There is an increasing number of noteworthy methods for predicting protein function from sequence and structural data alone, many of which are readily available to cell biologists who are aware of the strengths and pitfalls of each available technique.

[1]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[2]  D M Burns,et al.  Evolution of the tryptophan synthetase of fungi. Analysis of experimentally fused Escherichia coli tryptophan synthetase alpha and beta chains. , 1990, The Journal of biological chemistry.

[3]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[4]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[5]  M. Swindells,et al.  Protein clefts in molecular recognition and function. , 1996, Protein science : a publication of the Protein Society.

[6]  Gapped BLAST and PSI-BLAST: A new , 1997 .

[7]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[8]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[9]  Yan P. Yuan,et al.  Predicting function: from genes to genomes and back. , 1998, Journal of molecular biology.

[10]  B. Snel,et al.  Conservation of gene order: a fingerprint of proteins that physically interact. , 1998, Trends in biochemical sciences.

[11]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[12]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[13]  S. Brenner Errors in genome annotation. , 1999, Trends in genetics : TIG.

[14]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Janet M. Thornton,et al.  Comparison of functional annotation schemes for genomes , 2000, Functional & Integrative Genomics.

[16]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[17]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[18]  W. Fitch Homology a personal view on some of the problems. , 2000, Trends in genetics : TIG.

[19]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[20]  A Godzik,et al.  Surface map comparison: studying function diversity of homologous proteins. , 2001, Journal of molecular biology.

[21]  A. Valencia,et al.  Similarity of phylogenetic trees as indicator of protein-protein interaction. , 2001, Protein engineering.

[22]  A. Valencia,et al.  Intrinsic errors in genome annotation. , 2001, Trends in genetics : TIG.

[23]  Frances M. G. Pearl,et al.  Review: what can structural classifications reveal about protein evolution? , 2001, Journal of structural biology.

[24]  Annabel E. Todd,et al.  Evolution of function in protein superfamilies, from a structural perspective. , 2001, Journal of molecular biology.

[25]  Evolution of the Tryptophan Synthetase of Fungi , 2001 .

[26]  Alex Bateman,et al.  The InterPro database, an integrated documentation resource for protein families, domains and functional sites , 2001, Nucleic Acids Res..

[27]  C. A. Andersen,et al.  Prediction of human protein function from post-translational modifications and localization features. , 2002, Journal of molecular biology.

[28]  Erik L. L. Sonnhammer,et al.  Automated ortholog inference from phylogenetic trees and calculation of orthology reliability , 2002, Bioinform..

[29]  E. Marcotte,et al.  Predicting functional linkages from gene fusions with confidence. , 2002, Applied bioinformatics.

[30]  Sarah A Teichmann,et al.  Conservation of gene co-regulation in prokaryotes and eukaryotes. , 2002, Trends in biotechnology.

[31]  B. Rost Enzyme function less conserved than anticipated. , 2002, Journal of molecular biology.

[32]  W. S. Valdar,et al.  Scoring residue conservation , 2002, Proteins.

[33]  Marcin P Joachimiak,et al.  JEvTrace: refinement and variations of the evolutionary trace in JAVA , 2002, Genome Biology.

[34]  Robert B. Russell,et al.  Annotation in three dimensions , 2003 .

[35]  J. Skolnick,et al.  How well is enzyme function conserved as a function of pairwise sequence identity? , 2003, Journal of molecular biology.

[36]  J. Whisstock,et al.  Prediction of protein function from protein sequence and structure , 2003, Quarterly Reviews of Biophysics.

[37]  Søren Brunak,et al.  Protein feature based identification of cell cycle regulated proteins in yeast. , 2003, Journal of Molecular Biology.

[38]  A. Valencia,et al.  Automatic methods for predicting functionally important residues. , 2003, Journal of molecular biology.

[39]  L. Kavraki,et al.  An accurate, sensitive, and scalable method to identify functional sites in protein structures. , 2003, Journal of molecular biology.

[40]  Anne-Lise Veuthey,et al.  Automated annotation of microbial proteomes in SWISS-PROT , 2003, Comput. Biol. Chem..

[41]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[42]  Rainer Breitling,et al.  Iterative Group Analysis (iGA): A simple tool to enhance sensitivity and facilitate interpretation of microarray experiments , 2004, BMC Bioinformatics.

[43]  Janet M. Thornton,et al.  An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis , 2003, Bioinform..

[44]  Sudhir Kumar,et al.  Genomic clocks and evolutionary timescales. , 2003, Trends in genetics : TIG.

[45]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[46]  Robert B. Russell,et al.  Annotation in three dimensions. PINTS: Patterns in Non-homologous Tertiary Structures , 2003, Nucleic Acids Res..

[47]  Ashish V. Tendulkar,et al.  Functional sites in protein families uncovered via an objective and automated graph theoretic approach. , 2003, Journal of molecular biology.

[48]  M. Campbell,et al.  PANTHER: a library of protein families and subfamilies indexed by function. , 2003, Genome research.

[49]  Frances M. G. Pearl,et al.  The CATH domain structure database. , 2005, Methods of biochemical analysis.

[50]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2005, Nucleic Acids Res..

[51]  H. Wolfson,et al.  Recognition of Functional Sites in Protein Structures☆ , 2004, Journal of Molecular Biology.

[52]  Natalie Wilson,et al.  Human Protein Reference Database , 2004, Nature Reviews Molecular Cell Biology.

[53]  D. Frishman,et al.  A domain interaction map based on phylogenetic profiling. , 2004, Journal of molecular biology.

[54]  Geoffrey J. Barton,et al.  GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes , 2004, BMC Bioinformatics.

[55]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[56]  T. Blundell,et al.  Distinguishing structural and functional restraints in evolution in order to identify interaction sites. , 2004, Journal of molecular biology.

[57]  P. Bork,et al.  Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs , 2004, Nature Biotechnology.

[58]  Patrik Edén,et al.  Comparing Functional Annotation Analyses with Catmap Comparing Functional Annotation Analyses with Catmap , 2004 .

[59]  Kengo Kinoshita,et al.  eF-site and PDBjViewer: database and viewer for protein functional sites , 2004, Bioinform..

[60]  J. Skolnick,et al.  EFICAz: a comprehensive approach for accurate genome-scale enzyme function inference. , 2004, Nucleic acids research.

[61]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[62]  Pierre Dönnes,et al.  Predicting Protein Subcellular Localization: Past, Present, and Future , 2004, Genomics, proteomics & bioinformatics.

[63]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[64]  J. Thornton,et al.  Predicting protein function from sequence and structural data. , 2005, Current opinion in structural biology.

[65]  Michael I. Jordan,et al.  Protein Molecular Function Prediction by Bayesian Phylogenomics , 2005, PLoS Comput. Biol..

[66]  A. E. Hirsh,et al.  Functional genomic analysis of the rates of protein evolution. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[67]  David A. Lee,et al.  Identification and distribution of protein families in 120 completed genomes using Gene3D , 2005, Proteins.

[68]  K. Katoh,et al.  MAFFT version 5: improvement in accuracy of multiple sequence alignment , 2005, Nucleic acids research.

[69]  A decade of genome-wide biology , 2005, Nature Genetics.

[70]  Gail J. Bartlett,et al.  Effective function annotation through catalytic residue conservation. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[71]  Erik L. L. Sonnhammer,et al.  FunShift: a database of function shift analysis on protein subfamilies , 2004, Nucleic Acids Res..

[72]  Adel Golovin,et al.  MSDsite: A database search and retrieval system for the analysis and viewing of bound ligands and active sites , 2004, Proteins.

[73]  Rachel Kolodny,et al.  Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. , 2005, Journal of molecular biology.

[74]  Janet M. Thornton,et al.  ProFunc: a server for predicting protein function from 3D structure , 2005, Nucleic Acids Res..

[75]  Ruth Nussinov,et al.  SiteEngines: recognition and comparison of binding sites and protein–protein interfaces , 2005, Nucleic Acids Res..

[76]  D. Eisenberg,et al.  Inference of protein function from protein structure. , 2005, Structure.

[77]  Erik L. L. Sonnhammer,et al.  Inparanoid: a comprehensive database of eukaryotic orthologs , 2004, Nucleic Acids Res..

[78]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[79]  A. Valencia,et al.  Text-mining and information-retrieval services for molecular biology , 2005, Genome Biology.

[80]  M. Sternberg,et al.  Assessing protein co-evolution in the context of the tree of life assists in the prediction of the interactome. , 2005, Journal of molecular biology.

[81]  Jie Liang,et al.  Protein surface analysis for function annotation in high‐throughput structural genomics pipeline , 2005, Protein science : a publication of the Protein Society.

[82]  Ying Wei,et al.  Prediction of active sites for protein structures from computed chemical properties , 2005, ISMB.

[83]  Dennis R. Livesay,et al.  MINER: software for phylogenetic motif identification , 2005, Nucleic Acids Res..

[84]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[85]  Janet M Thornton,et al.  Protein function prediction using local 3D templates. , 2005, Journal of molecular biology.

[86]  Daisuke Kihara,et al.  Enhanced automated function prediction using distantly related sequences and contextual association by PFP , 2006, Protein science : a publication of the Protein Society.

[87]  K. S. Deshpande,et al.  Human protein reference database—2006 update , 2005, Nucleic Acids Res..

[88]  E. Birney,et al.  Dry work in a wet world: computation in systems biology , 2006, Molecular systems biology.

[89]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt): an expanding universe of protein information , 2005, Nucleic Acids Res..

[90]  Ziv Bar-Joseph,et al.  Evaluation of different biological data and computational classification methods for use in protein interaction prediction , 2006, Proteins.

[91]  K. Sjölander,et al.  PhyloFacts: an online structural phylogenomic encyclopedia for protein functional and structural classification , 2006, Genome Biology.

[92]  J. Heringa,et al.  Sequence comparison by sequence harmony identifies subtype-specific functional sites , 2006, Nucleic acids research.

[93]  A. S. Juncker,et al.  A wiring of the human nucleolus. , 2006, Molecular cell.

[94]  A. M. Lisewski,et al.  Rapid detection of similarity in protein structure and function through contact metric distances , 2006, Nucleic acids research.

[95]  Olivier Lichtarge,et al.  ET viewer: an application for predicting and visualizing functional sites in protein structures , 2006, Bioinform..

[96]  Nikos Kyrpides,et al.  The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide , 2005, Nucleic Acids Res..

[97]  Gopal R. Gopinath,et al.  Reactome: a knowledge base of biologic pathways and processes , 2007, Genome Biology.

[98]  Hans-Werner Mewes,et al.  MPact: the MIPS protein interaction resource on yeast , 2005, Nucleic Acids Res..

[99]  S. Henikoff,et al.  Predicting the effects of amino acid substitutions on protein function. , 2006, Annual review of genomics and human genetics.

[100]  Omkar Mate Protein Structure Alignment Protein Structure Alignment , 2006 .

[101]  Gabrielle A. Reeves,et al.  Structural diversity of domain superfamilies in the CATH database. , 2006, Journal of molecular biology.

[102]  J. Gardy,et al.  Methods for predicting bacterial protein subcellular localization , 2006, Nature Reviews Microbiology.

[103]  Patricia C. Babbitt,et al.  Automated discovery of 3D motifs for protein function annotation , 2006, Bioinform..

[104]  Alfonso Valencia,et al.  Phylogeny-independent detection of functional residues , 2006, Bioinform..

[105]  J. Gardy,et al.  Methods for predicting bacterial protein subcellular localization , 2006, Nature Reviews Microbiology.

[106]  Alexander E. Kel,et al.  TRANSPATH®: an information resource for storing and visualizing signaling pathways and their pathological aberrations , 2005, Nucleic Acids Res..

[107]  Olivier Lichtarge,et al.  Rank information: A structure‐independent measure of evolutionary trace quality that improves identification of protein functional sites , 2006, Proteins.

[108]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2006, Nucleic Acids Res..

[109]  R. Russell,et al.  Structural systems biology: modelling protein interactions , 2006, Nature Reviews Molecular Cell Biology.

[110]  Jukka Corander,et al.  Bayesian search of functionally divergent protein subgroups and their function specific residues , 2006, Bioinform..

[111]  Thomas Lengauer,et al.  A new measure for functional similarity of gene products based on Gene Ontology , 2006, BMC Bioinformatics.

[112]  A. Godzik,et al.  Computational protein function prediction: Are we making progress? , 2007, Cellular and Molecular Life Sciences.

[113]  Christian von Mering,et al.  STRING 7—recent developments in the integration and prediction of protein interactions , 2006, Nucleic Acids Res..

[114]  Frances M. G. Pearl,et al.  The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution , 2006, Nucleic Acids Res..

[115]  Pingzhao Hu,et al.  Computational prediction of cancer-gene function , 2007, Nature Reviews Cancer.

[116]  Christine A. Orengo,et al.  Inferring Function Using Patterns of Native Disorder in Proteins , 2007, PLoS Comput. Biol..

[117]  Helen E. Parkinson,et al.  ArrayExpress—a public database of microarray experiments and gene expression profiles , 2006, Nucleic Acids Res..

[118]  Christine A. Orengo,et al.  Predicting Protein Function with Hierarchical Phylogenetic Profiles: The Gene3D Phylo-Tuner Method Applied to Eukaryotic Genomes , 2007, PLoS Comput. Biol..

[119]  Benjamin A. Shoemaker,et al.  Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners , 2007, PLoS Comput. Biol..

[120]  Robert D. Finn,et al.  New developments in the InterPro database , 2007, Nucleic Acids Res..

[121]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..