FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties.

FUGUE, a program for recognizing distant homologues by sequence-structure comparison (http://www-cryst.bioc.cam.ac.uk/fugue/), has three key features. (1) Improved environment-specific substitution tables. Substitutions of an amino acid in a protein structure are constrained by its local structural environment, which can be defined in terms of secondary structure, solvent accessibility, and hydrogen bonding status. The environment-specific substitution tables have been derived from structural alignments in the HOMSTRAD database (http://www-cryst.bioc. cam.ac.uk/homstrad/). (2) Automatic selection of alignment algorithm with detailed structure-dependent gap penalties. FUGUE uses the global-local algorithm to align a sequence-structure pair when they greatly differ in length and uses the global algorithm in other cases. The gap penalty at each position of the structure is determined according to its solvent accessibility, its position relative to the secondary structure elements (SSEs) and the conservation of the SSEs. (3) Combined information from both multiple sequences and multiple structures. FUGUE is designed to align multiple sequences against multiple structures to enrich the conservation/variation information. We demonstrate that the combination of these three key features implemented in FUGUE improves both homology recognition performance and alignment accuracy.

[1]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[2]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[3]  M. Sternberg,et al.  Enhanced genome annotation using structural profiles in the program 3D-PSSM. , 2000, Journal of molecular biology.

[4]  Charlotte M. Deane,et al.  JOY: protein sequence-structure representation and analysis , 1998, Bioinform..

[5]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[6]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[7]  M. Gribskov,et al.  [9] Profile analysis , 1990 .

[8]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[9]  T L Blundell,et al.  Use of amino acid environment-dependent substitution tables and conformational propensities in structure prediction from aligned sequences of homologous proteins. II. Secondary structures. , 1994, Journal of molecular biology.

[10]  G. Gonnet,et al.  Exhaustive matching of the entire protein sequence database. , 1992, Science.

[11]  T. Blundell,et al.  Definition of general topological equivalence in protein structures. A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. , 1990, Journal of molecular biology.

[12]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[13]  M Levitt,et al.  Alignment of the amino acid sequences of distantly related proteins using variable gap penalties. , 1986, Protein engineering.

[14]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[15]  G J Barton,et al.  Continuous and discontinuous domains: An algorithm for the automatic generation of reliable protein domain definitions , 1995, Protein science : a publication of the Protein Society.

[16]  A. Panchenko,et al.  Combination of threading potentials and sequence profiles improves fold recognition. , 2000, Journal of molecular biology.

[17]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[18]  John P. Overington,et al.  Alignment and searching for common protein folds using a data bank of structural templates. , 1993, Journal of molecular biology.

[19]  John P. Overington,et al.  HOMSTRAD: A database of protein structure alignments for homologous families , 1998, Protein science : a publication of the Protein Society.

[20]  R J Read,et al.  Critical evaluation of comparative model building of Streptomyces griseus trypsin. , 1984, Biochemistry.

[21]  John P. Overington,et al.  Fragment ranking in modelling of protein structure. Conformationally constrained environmental amino acid substitution tables. , 1993, Journal of molecular biology.

[22]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[23]  M Vingron,et al.  Weighting in sequence space: a comparison of methods in terms of generalized sequences. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[24]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[25]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[26]  A Elofsson,et al.  Assessing the performance of fold recognition methods by means of a comprehensive benchmark. , 1996, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[27]  D. Haussler,et al.  Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. , 1998, Journal of molecular biology.

[28]  P. Argos,et al.  Analysis of insertions/deletions in protein structures. , 1992, Journal of molecular biology.

[29]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[30]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[31]  E. Lindahl,et al.  Identification of related proteins on family, superfamily and fold level. , 2000, Journal of molecular biology.

[32]  T L Blundell,et al.  Comparison of solvent-inaccessible cores of homologous proteins: definitions useful for protein modelling. , 1987, Protein engineering.

[33]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[34]  W. Lim,et al.  Alternative packing arrangements in the hydrophobic core of λrepresser , 1989, Nature.

[35]  W. Pearson Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. , 1991, Genomics.

[36]  M. Levitt Conformational preferences of amino acids in globular proteins. , 1978, Biochemistry.

[37]  Kenji Mizuguchi,et al.  Analysis of conservation and substitutions of secondary structure elements within protein superfamilies , 2000, Bioinform..

[38]  Julie Dawn Thompson,et al.  Improved sensitivity of profile searches through the use of sequence weights and gap excision , 1994, Comput. Appl. Biosci..

[39]  M. James,et al.  Comparison of the predicted model of α-lytic protease with the X-ray structure , 1979, Nature.

[40]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[41]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[42]  T L Blundell,et al.  An automatic method involving cluster analysis of secondary structures for the identification of domains in proteins , 1995, Protein science : a publication of the Protein Society.

[43]  A. Elofsson,et al.  Hidden Markov models that use predicted secondary structures for fold recognition , 1999, Proteins.

[44]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[45]  K. Mizuguchi,et al.  An iterative structure‐assisted approach to sequence alignment and comparative modeling , 1999, Proteins.

[46]  Martin Vingron,et al.  A fast and sensitive multiple sequence alignment algorithm , 1989, Comput. Appl. Biosci..

[47]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[48]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[49]  G. Casari,et al.  Identification of native protein folds amongst a large number of incorrect models. The calculation of low energy conformations from potentials of mean force. , 1990, Journal of molecular biology.

[50]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[51]  D Eisenberg,et al.  A 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence. , 1997, Journal of molecular biology.

[52]  John P. Overington,et al.  Tertiary structural constraints on protein evolutionary diversity: templates, key residues and structure prediction , 1990, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[53]  M J Sternberg,et al.  Identification and analysis of domains in proteins. , 1995, Protein engineering.

[54]  A. D. McLachlan,et al.  Secondary structure‐based profiles: Use of structure‐conserving scoring tables in searching protein sequence databases for structural similarities , 1991, Proteins.

[55]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[56]  P. K. Mehta,et al.  A simple and fast approach to prediction of protein secondary structure from multiply aligned sequences with accuracy above 70% , 1995, Protein science : a publication of the Protein Society.

[57]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[58]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[59]  M J Sippl,et al.  Knowledge-based potentials for proteins. , 1995, Current opinion in structural biology.

[60]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[61]  T. L. Blundell,et al.  Knowledge-based prediction of protein structures and the design of novel molecules , 1987, Nature.

[62]  S. Wodak,et al.  Protein structure prediction by threading methods: Evaluation of current techniques , 1995, Proteins.

[63]  P. Y. Chou,et al.  Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins. , 1974, Biochemistry.

[64]  Frank Wilcoxon,et al.  Probability tables for individual comparisons by ranking methods. , 1947 .