Exploiting Protein Structures to Predict Protein Functions

The exponential growth of experimentally determined protein structures in the Protein Data Bank (PDB) has provided structural data for an ever increasing proportion of genomic sequences. In combination with enhanced functional annotation from sequence, it has become possible to predict protein function from structure. In this chapter we discuss a range of methods which aim to recognise enzyme active sites and predict protein-ligand interactions. We then focus on algorithms developed as part of the CATH database of structural domains, where an evolutionary approach is used to recognise proteins with similar functions. While protein domains that exhibit the same structural fold tend to display related functional activities, there are a several large domain structure superfamilies that show a high degree of functional diversity. In these cases, we have built novel tools (FLORA and GeMMA) which are able to effectively identify sub-families of functionally linked domains, where standard methods of homologue detection (e.g. sequence profile and global structure alignment) fail.

[1]  Janet M Thornton,et al.  Protein function prediction using local 3D templates. , 2005, Journal of molecular biology.

[2]  David A. Lee,et al.  Predicting protein function from sequence and structure , 2007, Nature Reviews Molecular Cell Biology.

[3]  C. Orengo,et al.  Protein folds and functions. , 1998, Structure.

[4]  Janet M Thornton,et al.  Pathway evolution, structurally speaking. , 2002, Current opinion in structural biology.

[5]  J. Söding,et al.  More than the sum of their parts: On the evolution of proteins from peptides , 2003, BioEssays : news and reviews in molecular, cellular and developmental biology.

[6]  Patricia C. Babbitt,et al.  Automated discovery of 3D motifs for protein function annotation , 2006, Bioinform..

[7]  J. Thornton,et al.  Predicting protein function from sequence and structural data. , 2005, Current opinion in structural biology.

[8]  Ashish V. Tendulkar,et al.  Functional sites in protein families uncovered via an objective and automated graph theoretic approach. , 2003, Journal of molecular biology.

[9]  Ian Sillitoe,et al.  FLORA: A Novel Method to Predict Protein Function from Structure in Diverse Superfamilies , 2009, PLoS Comput. Biol..

[10]  Frances M. G. Pearl,et al.  CATHEDRAL: A Fast and Effective Algorithm to Predict Folds and Domain Boundaries from Multidomain Protein Structures , 2007, PLoS Comput. Biol..

[11]  Terri K. Attwood,et al.  PRINTS and its automatic supplement, prePRINTS , 2003, Nucleic Acids Res..

[12]  David A. Lee,et al.  Progress towards mapping the universe of protein folds , 2004, Genome Biology.

[13]  Benoit H. Dessailly,et al.  Detailed analysis of function divergence in a large and diverse domain superfamily: toward a refined protocol of function classification. , 2010, Structure.

[14]  Annabel E. Todd,et al.  Evolution of function in protein superfamilies, from a structural perspective. , 2001, Journal of molecular biology.

[15]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[16]  Patricia C. Babbitt,et al.  Evolutionarily Conserved Substrate Substructures for Automated Annotation of Enzyme Superfamilies , 2008, PLoS Comput. Biol..

[17]  Conrad C. Huang,et al.  Leveraging enzyme structure-function relationships for functional inference and experimental design: the structure-function linkage database. , 2006, Biochemistry.

[18]  M J Sternberg,et al.  Supersites within superfolds. Binding site similarity in the absence of homology. , 1998, Journal of molecular biology.

[19]  Guna Seetharaman,et al.  Functionally important segments in proteins dissected using Gene Ontology and geometric clustering of peptide fragments , 2008, Genome Biology.

[20]  David A. Lee,et al.  GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains , 2009, Nucleic acids research.

[21]  Eugene V Koonin,et al.  Monophyly of class I aminoacyl tRNA synthetase, USPA, ETFP, photolyase, and PP‐ATPase nucleotide‐binding domains: implications for protein evolution in the RNA world , 2002, Proteins.

[22]  Rachel Kolodny,et al.  Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. , 2005, Journal of molecular biology.

[23]  Gail J. Bartlett,et al.  Using a library of structural templates to recognise catalytic sites and explore their evolution in homologous families. , 2005, Journal of molecular biology.

[24]  William R Taylor,et al.  Evolutionary transitions in protein fold space. , 2007, Current opinion in structural biology.

[25]  Robert B. Russell,et al.  Annotation in three dimensions , 2003 .

[26]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[27]  Robert B. Russell,et al.  Annotation in three dimensions. PINTS: Patterns in Non-homologous Tertiary Structures , 2003, Nucleic Acids Res..

[28]  Gabrielle A. Reeves,et al.  Structural diversity of domain superfamilies in the CATH database. , 2006, Journal of molecular biology.

[29]  Benoit H. Dessailly,et al.  Exploring the structure and function paradigm. , 2008, Current opinion in structural biology.

[30]  Cyrus Chothia,et al.  The SUPERFAMILY database in 2007: families and functions , 2006, Nucleic Acids Res..

[31]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[32]  Lei Xie,et al.  Detecting evolutionary relationships across existing fold space, using sequence order-independent profile–profile alignments , 2008, Proceedings of the National Academy of Sciences.

[33]  Manuel C. Peitsch,et al.  Computational structural biology : methods and applications , 2008 .

[34]  Nigel J. Martin,et al.  Gene3D: comprehensive structural and functional annotation of genomes , 2007, Nucleic Acids Res..

[35]  Amos Bairoch,et al.  The PROSITE database , 2005, Nucleic Acids Res..

[36]  Adam Godzik,et al.  Flexible structure alignment by chaining aligned fragment pairs allowing twists , 2003, ECCB.

[37]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[38]  C. Orengo,et al.  Plasticity of enzyme active sites. , 2002, Trends in biochemical sciences.

[39]  N. Grishin Fold change in evolution of protein structures. , 2001, Journal of structural biology.

[40]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[41]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[42]  Ian Sillitoe,et al.  Assessing strategies for improved superfamily recognition , 2005, Protein science : a publication of the Protein Society.

[43]  A. Murzin,et al.  Evolution of protein fold in the presence of functional constraints. , 2006, Current opinion in structural biology.

[44]  Ian Sillitoe,et al.  The CATH Hierarchy Revisited—Structural Divergence in Domain Superfamilies and the Continuity of Fold Space , 2009, Structure.

[45]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[46]  J. Gerlt A Protein Structure (or Function ?) Initiative. , 2007, Structure.