Recognition of remotely related structural homologues using sequence profiles of aligned homologous protein structures

In order to bridge the gap between proteins with three-dimensional (3-D) structural information and those without 3-D structures, extensive experimental and computational efforts for structure recognition are being invested. One of the rapid and simple computational approaches for structure recognition makes use of sequence profiles with sensitive profile matching procedures to identify remotely related homologous families. While adopting this approach we used profiles that are generated from structure-based sequence alignment of homologous protein domains of known structures integrated with sequence homologues. We present an assessment of this fast and simple approach. About one year ago, using this approach, we had identified structural homologues for 315 sequence families, which were not known to have any 3-D structural information. The subsequent experimental structure determination for at least one of the members in 110 of 315 sequence families allowed a retrospective assessment of the correctness of structure recognition. We demonstrate that correct folds are detected with an accuracy of 96.4% (106/110). Most (81/106) of the associations are made correctly to the specific structural family. For 23/106, the structure associations are valid at the superfamily level. Thus, profiles of protein families of known structure when used with sensitive profile-based search procedure result in structure association of high confidence. Further assignment at the level of superfamily or family would provide clues to probable functions of new proteins. Importantly, the public availability of these profiles from us could enable one to perform genome wide structure assignment in a local machine in a fast and accurate manner.

[1]  M. Sternberg,et al.  Enhanced genome annotation using structural profiles in the program 3D-PSSM. , 2000, Journal of molecular biology.

[2]  M Gerstein,et al.  Protein evolution. How far can sequences diverge? , 1997, Nature.

[3]  B. Rost,et al.  Protein fold recognition by prediction-based threading. , 1997, Journal of molecular biology.

[4]  C. Chothia,et al.  Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. , 2001, Journal of molecular biology.

[5]  Shashi B. Pandit,et al.  SUPFAM - a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes , 2002, Nucleic Acids Res..

[6]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[7]  Arne Elofsson,et al.  3D-Jury: A Simple Approach to Improve Protein Structure Predictions , 2003, Bioinform..

[8]  S. Hainsworth,et al.  A CRITICAL ASSESSMENT , 2014 .

[9]  S. Balaji,et al.  PALI - a database of Phylogeny and ALIgnment of homologous protein structures , 2001, Nucleic Acids Res..

[10]  David Eisenberg,et al.  The directional atomic solvation energy: An atom-based potential for the assignment of protein sequences to known folds , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[11]  I. Vetter,et al.  The Guanine Nucleotide-Binding Switch in Three Dimensions , 2001, Science.

[12]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[13]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[14]  S. Balaji,et al.  Integration of related sequences with protein three-dimensional structural families in an updated version of PALI database , 2003, Nucleic Acids Res..

[15]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[16]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[17]  Richard Bonneau,et al.  Ab initio protein structure prediction: progress and prospects. , 2001, Annual review of biophysics and biomolecular structure.

[18]  M Gerstein,et al.  Advances in structural genomics. , 1999, Current opinion in structural biology.

[19]  E. Pai,et al.  The structure of Ras protein: a model for a universal molecular switch. , 1991, Trends in biochemical sciences.

[20]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[21]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[22]  D Fischer,et al.  CAFASP‐1: Critical assessment of fully automated structure prediction methods , 1999, Proteins.

[23]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[24]  N Srinivasan,et al.  Use of a database of structural alignments and phylogenetic trees in investigating the relationship between sequence and structural variability among homologous proteins. , 2001, Protein engineering.

[25]  Steven E. Brenner,et al.  Target selection for structural genomics , 2000, Nature Structural Biology.

[26]  Chi-Huey Wong,et al.  Structure-based mutagenesis approaches toward expanding the substrate specificity of D-2-deoxyribose-5-phosphate aldolase. , 2003, Bioorganic & medicinal chemistry.

[27]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[28]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[29]  B. Honig,et al.  On the role of structural information in remote homology detection and sequence alignment: new methods using hybrid sequence profiles. , 2003, Journal of molecular biology.

[30]  Jakub Pas,et al.  Application of 3D‐Jury, GRDB, and Verify3D in fold recognition , 2003, Proteins.

[31]  E V Koonin,et al.  Protein fold recognition using sequence profiles and its application in structural genomics. , 2000, Advances in protein chemistry.

[32]  Sam Griffiths-Jones,et al.  The use of structure information to increase alignment accuracy does not aid homologue detection with profile HMMs , 2002, Bioinform..

[33]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[34]  G. Gilliland,et al.  Crystal structure of dephospho-coenzyme A kinase from Haemophilus influenzae. , 2001, Journal of structural biology.

[35]  D Fischer,et al.  LiveBench‐1: Continuous benchmarking of protein structure prediction servers , 2001, Protein science : a publication of the Protein Society.

[36]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[37]  Mark Gerstein,et al.  How far can sequences diverge? , 1997, Nature.

[38]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Alejandro A. Schäffer,et al.  IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices , 1999, Bioinform..

[40]  Janet M. Thornton,et al.  Protein fold recognition , 1993, J. Comput. Aided Mol. Des..

[41]  Daniel Fischer,et al.  3D‐SHOTGUN: A novel, cooperative, fold‐recognition meta‐predictor , 2003, Proteins.

[42]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[43]  Leszek Rychlewski,et al.  Detection of reliable and unexpected protein fold predictions using 3D-Jury , 2003, Nucleic Acids Res..

[44]  C. Sander,et al.  The FSSP database of structurally aligned protein fold families. , 1994, Nucleic acids research.

[45]  E. Koonin,et al.  Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches. , 1999, Journal of molecular biology.

[46]  Harold A. Scheraga,et al.  Ab Initio Folding of Multiple-Chain Proteins , 2001, Pacific Symposium on Biocomputing.

[47]  D. Rice,et al.  Glycerol dehydrogenase. structure, specificity, and mechanism of a family III polyol dehydrogenase. , 2001, Structure.

[48]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[49]  T. Hubbard,et al.  Critical assessment of methods of protein structure prediction (CASP)‐round V , 2003, Proteins.

[50]  G. Barton,et al.  Protein fold recognition by mapping predicted secondary structures. , 1996, Journal of molecular biology.

[51]  Marc A. Martí-Renom,et al.  EVA: continuous automatic evaluation of protein structure prediction servers , 2001, Bioinform..

[52]  Chi‐Huey Wong,et al.  Aldolase-catalyzed asymmetric synthesis of novel pyranose synthons as a new entry to heterocycles and epothilones. , 2002, Angewandte Chemie.

[53]  Frances M. G. Pearl,et al.  Gene3D: structural assignment for whole genes and genomes using the CATH domain structure database. , 2002, Genome research.

[54]  H. Tsuge,et al.  Crystal structure and site-directed mutagenesis of enzymatic components from Clostridium perfringens iota-toxin. , 2003, Journal of molecular biology.

[55]  S R Sprang,et al.  G protein mechanisms: insights from structural analysis. , 1997, Annual review of biochemistry.

[56]  M. Sippl Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.

[57]  Richard Bonneau,et al.  Rosetta in CASP4: Progress in ab initio protein structure prediction , 2001, Proteins.