An Improved Scoring Matrix for Multiple Sequence Alignment

The way for performing multiple sequence alignment is based on the criterion of the maximum-scored information content computed from a weight matrix, but it is possible to have two or more alignments to have the same highest score leading to ambiguities in selecting the best alignment. This paper addresses this issue by introducing the concept of joint weight matrix to eliminate the randomness in selecting the best multiple sequence alignment. Alignments with equal scores are iteratively rescored with the joint weight matrix of increasing level (nucleotide pairs, triplets, and so on) until one single best alignment is eventually found. This method for resolving ambiguity in multiple sequence alignment can be easily implemented by use of the improved scoring matrix.

[1]  G. Stormo,et al.  Identifying protein-binding sites from unaligned DNA fragments. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Mikhail S. Gelfand,et al.  A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length , 2005, Bioinform..

[3]  D. Casane,et al.  Molecular evidence for precambrian origin of amelogenin, the major protein of vertebrate enamel. , 2001, Molecular biology and evolution.

[4]  J. Shu,et al.  HYPERCOMPLEX CROSS-CORRELATION OF DNA SEQUENCES , 2010, 1402.5341.

[5]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[6]  T. D. Schneider,et al.  Redox-dependent shift of OxyR-DNA contacts along an extended DNA-binding site: A mechanism for differential promoter selection , 1994, Cell.

[7]  Lynn Kuo,et al.  An improved collapsed Gibbs sampler for Dirichlet process mixing models , 2006, Comput. Stat. Data Anal..

[8]  Jian-Jun Shu,et al.  Pairwise alignment of the DNA sequence using hypercomplex number representation , 2004, Bulletin of mathematical biology.

[9]  Kathleen Marchal,et al.  A Gibbs sampling method to detect over-represented motifs in the upstream regions of co-expressed genes , 2001, RECOMB.

[10]  Serafim Batzoglou,et al.  Eukaryotic regulatory element conservation analysis and identification using comparative genomics. , 2004, Genome research.

[11]  Jian-Jun Shu,et al.  DNA-based computing of strategic assignment problems. , 2011, Physical review letters.

[12]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[13]  Bin Li,et al.  Limitations and potentials of current motif discovery algorithms , 2005, Nucleic acids research.

[14]  T. D. Schneider,et al.  Information content of binding sites on nucleotide sequences. , 1986, Journal of molecular biology.

[15]  Thomas D. Schneider,et al.  Fast Multiple Alignment of Ungapped DNA Sequences Using Information Theory and a Relaxation Method , 1996, Discret. Appl. Math..