Identifying DNA motifs based on match and mismatch alignment information

The conventional way of identifying DNA motifs, solely based on match alignment information, is susceptible to a high number of spurious sites. A novel scoring system has been introduced by taking both match and mismatch alignment information into account. The mismatch alignment information is useful to remove spurious sites encountered in DNA motif searching. As an example, a correct TATA box site in Homo sapiens$$H4/g$$ gene has successfully been identified based on match and mismatch alignment information.

[1]  Jian-Jun Shu,et al.  An Improved Scoring Matrix for Multiple Sequence Alignment , 2012, 1402.5327.

[2]  Eivind Coward,et al.  Equivalence of two Fourier methods for biological sequences , 1997 .

[3]  Gary D. Stormo,et al.  Identification of consensus patterns in unaligned DNA sequences known to be functionally related , 1990, Comput. Appl. Biosci..

[4]  Jian-Jun Shu,et al.  Pairwise alignment of the DNA sequence using hypercomplex number representation , 2004, Bulletin of mathematical biology.

[5]  J. T. Kadonaga,et al.  The RNA polymerase II core promoter. , 2003, Annual review of biochemistry.

[6]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[7]  Vsevolod J. Makeev,et al.  Motif discovery and motif finding from genome-mapped DNase footprint data , 2009, Bioinform..

[8]  Martin Tompa,et al.  An Exact Method for Finding Short Motifs in Sequences, with Application to the Ribosome Binding Site Problem , 1999, ISMB.

[9]  P. Bucher Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. , 1990, Journal of molecular biology.

[10]  Eugene Bolotin,et al.  Prevalence of the initiator over the TATA box in human and yeast genes and identification of DNA motifs enriched in human TATA-less core promoters. , 2007, Gene.

[11]  T H Jukes,et al.  Rates of transition and transversion in coding sequences since the human-rodent divergence. , 1994, Genomics.

[12]  Jian-Jun Shu,et al.  A statistical thin-tail test of predicting regulatory regions in the Drosophila genome , 2012, Theoretical Biology and Medical Modelling.

[13]  J. Shu,et al.  HYPERCOMPLEX CROSS-CORRELATION OF DNA SEQUENCES , 2010, 1402.5341.

[14]  Jian-Jun Shu,et al.  DNA-based computing of strategic assignment problems. , 2011, Physical review letters.

[15]  Armin Shmilovici,et al.  Identification of transcription factor binding sites with variable-order Bayesian networks , 2005, Bioinform..

[16]  Vera Afreixo,et al.  Fourier analysis of symbolic data: A brief review , 2004, Digit. Signal Process..

[17]  H. K. Dai,et al.  A survey of DNA motif finding algorithms , 2007, BMC Bioinformatics.

[18]  Comparative modeling and docking studies of p16ink4/Cyclin D1/Rb pathway genes in lung cancer revealed functionally interactive residue of RB1 and its functional partner E2F1 , 2013, Theoretical Biology and Medical Modelling.