STRUCTFAST: Protein sequence remote homology detection and alignment using novel dynamic programming and profile–profile scoring

STRUCTFAST is a novel profile—profile alignment algorithm capable of detecting weak similarities between protein sequences. The increased sensitivity and accuracy of the STRUCTFAST method are achieved through several unique features. First, the algorithm utilizes a novel dynamic programming engine capable of incorporating important information from a structural family directly into the alignment process. Second, the algorithm employs a rigorous analytical formula for profile–profile scoring to overcome the limitations of ad hoc scoring functions that require adjustable parameter training. Third, the algorithm employs Convergent Island Statistics (CIS) to compute the statistical significance of alignment scores independently for each pair of sequences. STRUCTFAST routinely produces alignments that meet or exceed the quality obtained by an expert human homology modeler, as evidenced by its performance in the latest CAFASP4 and CASP6 blind prediction benchmark experiments. Proteins 2006. © 2006 Wiley‐Liss, Inc.

[1]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[2]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[3]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[4]  Marcin von Grotthuss,et al.  Detecting distant homology with Meta-BASIC , 2004, Nucleic Acids Res..

[5]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[6]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[7]  Aleksandar Poleksic,et al.  Convergent Island Statistics: a fast method for determining local alignment score significance , 2005, Bioinform..

[8]  A. B. Robinson,et al.  Distribution of glutamine and asparagine residues and their near neighbors in peptides and proteins. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Daniel Fischer,et al.  3D‐SHOTGUN: A novel, cooperative, fold‐recognition meta‐predictor , 2003, Proteins.

[10]  Roland L. Dunbrack,et al.  CAFASP3: The third critical assessment of fully automated structure prediction methods , 2003, Proteins.

[11]  Golan Yona,et al.  Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. , 2002, Journal of molecular biology.

[12]  N. Grishin,et al.  COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. , 2003, Journal of molecular biology.

[13]  K Karplus,et al.  What is the value added by human intervention in protein structure prediction? , 2001, Proteins.

[14]  Mariusz Milik,et al.  Algorithm for rapid reconstruction of protein backbone from alpha carbon coordinates , 1997, J. Comput. Chem..

[15]  S. Altschul,et al.  The estimation of statistical parameters for local alignment score distributions. , 2001, Nucleic acids research.

[16]  Leszek Rychlewski,et al.  LiveBench‐8: The large‐scale, continuous assessment of automated protein structure prediction , 2005, Protein science : a publication of the Protein Society.

[17]  Roland L Dunbrack,et al.  Assessment of fold recognition predictions in CASP6 , 2005, Proteins.

[18]  Hongyi Zhou,et al.  Fold recognition by combining sequence profiles derived from evolution and from depth‐dependent structural alignment of fragments , 2004, Proteins.

[19]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[20]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[21]  Mariusz Milik,et al.  Algorithm for rapid reconstruction of protein backbone from alpha carbon coordinates , 1997 .

[22]  Alfonso Valencia,et al.  Assessment of predictions submitted for the CASP6 comparative modeling category , 2005, Proteins.

[23]  Chin-Hsien Tai,et al.  Assessment of CASP6 predictions for new and nearly new fold targets , 2005, Proteins.

[24]  A. Dembo,et al.  Critical Phenomena for Sequence Matching with Scoring , 1994 .

[25]  John P. Overington,et al.  HOMSTRAD: A database of protein structure alignments for homologous families , 1998, Protein science : a publication of the Protein Society.

[26]  Marcin von Grotthuss,et al.  ORFeus: detection of distant homology using sequence profiles and predicted secondary structure , 2003, Nucleic Acids Res..

[27]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[28]  A. Sali,et al.  Comparative protein structure modeling of genes and genomes. , 2000, Annual review of biophysics and biomolecular structure.

[29]  Hongyi Zhou,et al.  Single‐body residue‐level knowledge‐based energy score combined with sequence‐profile and secondary structure information for fold recognition , 2004, Proteins.