combAlign: A protein sequence comparison algorithm considering recombinations

The basic linear treatment of sequence comparisons limits the ability of contemporary sequence alignment algorithms to detect non-order-conserving recombinations. Here, we introduce the algorithm combAlign which addresses the assessment of pairwise sequence similarity on non-order-conserving recombinations on a large scale. Emphasizing a two-level approach, combAlign first detects locally well conserved subsequences in a target and a source sequence. Subsequently, the relative placement of alignments is mapped to a graph. Concatenating local alignments to reassemble the target sequence to the fullest extent, the maximum scoring path through the graph denotes the best attainable combAlignment. Parameters influencing this process can be set to meet the user's specific demands. combAlign is applied to examples demonstrating the possibility to reflect evolutionary kinship of proteins even if their domains and motifs are strongly rearranged.

[1]  Gary Benson Sequence Alignment with Tandem Duplication , 1997, J. Comput. Biol..

[2]  Christopher J. Lee,et al.  Multiple sequence alignment using partial order graphs , 2002, Bioinform..

[3]  W. Atchley,et al.  Evolution of bHLH transcription factors: modular evolution by domain shuffling? , 1999, Molecular biology and evolution.

[4]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[5]  R. Durbin,et al.  Biological sequence analysis: Background on probability , 1998 .

[6]  Thomas L. Madden,et al.  BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. , 1999, FEMS microbiology letters.

[7]  S. Altschul Amino acid substitution matrices from an information theoretic perspective , 1991, Journal of Molecular Biology.

[8]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[9]  G. Weiller Phylogenetic profiles: a graphical method for detecting genetic recombinations in homologous sequences. , 1998, Molecular biology and evolution.

[10]  D. Eisenberg,et al.  Domain swapping: entangling alliances between proteins. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[11]  M. Waterman,et al.  A method for fast database search for all k-nucleotide repeats. , 1994, Nucleic acids research.

[12]  P. Argos,et al.  A method to recognize distant repeats in protein sequences , 1993, Proteins.

[13]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[14]  Wen-Hsiung Li,et al.  Fundamentals of molecular evolution , 1990 .

[15]  Burkhard Morgenstern,et al.  DIALIGN: finding local similarities by multiple sequence alignment , 1998, Bioinform..

[16]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[17]  D. Posada Evaluation of methods for detecting recombination from DNA sequences: empirical data. , 2002, Molecular biology and evolution.

[18]  K. Crandall,et al.  Evaluation of methods for detecting recombination from DNA sequences: Computer simulations , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[19]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[20]  Martin Vingron,et al.  Modeling Amino Acid Replacement , 2000, J. Comput. Biol..

[21]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[22]  A. Dress,et al.  Multiple DNA and protein sequence alignment based on segment-to-segment comparison. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[23]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[24]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[25]  Jean-Paul Delahaye,et al.  Transformation distances: a family of dissimilarity measures based on movements of segments , 1999, Bioinform..