A segment alignment approach to protein comparison

MOTIVATION Local structure segments (LSSs) are small structural units shared by unrelated proteins. They are extensively used in protein structure comparison, and predicted LSSs (PLSSs) are used very successfully in ab initio folding simulations. However, predicted or real LSSs are rarely exploited by protein sequence comparison programs that are based on position-by-position alignments. RESULTS We developed a SEgment Alignment algorithm (SEA) to compare proteins described as a collection of predicted local structure segments (PLSSs), which is equivalent to an unweighted graph (network). Any specific structure, real or predicted corresponds to a specific path in this network. SEA then uses a network matching approach to find two most similar paths in networks representing two proteins. SEA explores the uncertainty and diversity of predicted local structure information to search for a globally optimal solution. It simultaneously solves two related problems: the alignment of two proteins and the local structure prediction for each of them. On a benchmark of protein pairs with low sequence similarity, we show that application of the SEA algorithm improves alignment quality as compared to FFAS profile-profile alignment, and in some cases SEA alignments can match the structural alignments, a feat previously impossible for any sequence based alignment methods.

[1]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[2]  Richard Bonneau,et al.  Ab initio protein structure prediction of CASP III targets using ROSETTA , 1999, Proteins.

[3]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[4]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[5]  D. Baker,et al.  Prediction of local structure in proteins using a library of sequence-structure motifs. , 1998, Journal of molecular biology.

[6]  E. Lander,et al.  Protein secondary structure prediction using nearest-neighbor methods. , 1993, Journal of molecular biology.

[7]  M Mezei,et al.  Chameleon sequences in the PDB. , 1998, Protein engineering.

[8]  Haruki Nakamura,et al.  Solution structure of a specific DNA complex of the Myb DNA-binding domain with cooperative recognition helices , 1994, Cell.

[9]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[10]  Eugene W. Myers,et al.  Optimal alignments in linear space , 1988, Comput. Appl. Biosci..

[11]  S. Henikoff,et al.  Amino acid substitution matrices. , 2000, Advances in protein chemistry.

[12]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[13]  L Rychlewski,et al.  Secondary structure prediction using segment similarity. , 1997, Protein engineering.

[14]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[15]  Michael S. Waterman,et al.  Introduction to computational biology , 1995 .

[16]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Melissa S. Cline,et al.  Predicting reliable regions in protein sequence alignments , 2002, Bioinform..

[18]  Leszek Rychlewski,et al.  Improving the quality of twilight‐zone alignments , 2000, Protein science : a publication of the Protein Society.

[19]  M J Sternberg,et al.  Progress in protein structure prediction: assessment of CASP3. , 1999, Current opinion in structural biology.

[20]  B. Rost,et al.  Protein structures sustain evolutionary drift. , 1997, Folding & design.

[21]  Mikhail S. Gelfand,et al.  Gene recognition in eukaryotic DNA by comparison of genomic sequences , 2001, Bioinform..

[22]  P. Pevzner,et al.  Gene recognition via spliced sequence alignment. , 1996, Proceedings of the National Academy of Sciences of the United States of America.