CORAL-M: Heuristic coding region alignment method for multiple genome sequences

Multiple sequence alignment is a scientific tool to assist the study of DNA homology, phylogeny determinations, and conserved motifs identification. Various heuristic MSA methods have been presented to obtain the resulting alignment for multiple sequences. Although these alignment tools are able to align protein, DNA, and RNA sequences successfully, they are not such successful in aligning coding region sequences because the resulting alignments maybe not consistent with practical observations. Therefore, we propose a method, CORAL-M, a heuristic coding regions alignment method for multiple genome sequences, especially for coding regions. CORAL-M adopts a probabilistic filtration model and the local optimal solution to align genome sequences (codon to codon with the wobble mask rule) by the sliding windows and, thus, obtains the near-optimal alignment in linear time. In the experimental results, CORAL-M can be used to find the potential function sites by aligning viral strains of Poliovirus 1–3, Enterovirus 71, and Coxsackievirus 16.

[1]  J Hein,et al.  A new method that simultaneously aligns and reconstructs ancestral sequences for any number of homologous sequences, when the phylogeny is given. , 1989, Molecular biology and evolution.

[2]  A. Lesk,et al.  Determinants of a protein fold. Unique features of the globin amino acid sequences. , 1987, Journal of molecular biology.

[3]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[4]  Chuan Yi Tang,et al.  RE-MuSiC: a tool for multiple sequence alignment with regular expression constraints , 2007, Nucleic Acids Res..

[5]  D. Lipman,et al.  The multiple sequence alignment problem in biology , 1988 .

[6]  R. Doolittle,et al.  Progressive sequence alignment as a prerequisitetto correct phylogenetic trees , 2007, Journal of Molecular Evolution.

[7]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[8]  N. Nagata,et al.  Temperature-sensitive mutants of enterovirus 71 show attenuation in cynomolgus monkeys. , 2005, The Journal of general virology.

[9]  W. J. Kent,et al.  Conservation, regulation, synteny, and introns in a large-scale C. briggsae-C. elegans genomic alignment. , 2000, Genome research.

[10]  Adam Yao,et al.  Super Pairwise Alignment (SPA): An Efficient Approach to Global Alignment for Homologous Sequences , 2003, J. Comput. Biol..

[11]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[12]  Chuan Yi Tang,et al.  GeneAlign: a coding exon prediction tool based on phylogenetical comparisons , 2006, Nucleic Acids Res..

[13]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[14]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.