Finding Motifs in DNA Sequences Using Low-Dispersion Sequences

Motif finding problems, abstracted as the planted (l, d)-motif finding problem, are a major task in molecular biology--finding functioning units and genes. In 2002, the random projection algorithm was introduced to solve the challenging (15, 4)-motif finding problem by using randomly chosen templates. Two years later, a so-called uniform projection algorithm was developed to improve the random projection algorithm by means of low-dispersion sequences generated by coverings. In this article, we introduce an improved projection algorithm called the low-dispersion projection algorithm, which uses low-dispersion sequences generated by developed almost difference families. Compared with the random projection algorithm, the low-dispersion projection algorithm can solve the (l, d)-motif finding problem with fewer templates without decreasing the success rate.

[1]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[2]  Pavel A. Pevzner,et al.  Combinatorial Approaches to Finding Subtle Signals in DNA Sequences , 2000, ISMB.

[3]  Charles Elkan,et al.  Unsupervised learning of multiple motifs in biopolymers using expectation maximization , 1995, Mach. Learn..

[4]  D. Galas,et al.  DNAse footprinting: a simple method for the detection of protein-DNA binding specificity. , 1978, Nucleic acids research.

[5]  Robert Winter,et al.  Dimensional crossover in Sr2RuO4 within a slave-boson mean-field theory , 2008, 0812.3731.

[6]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[7]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[8]  A. A. Reilly,et al.  An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences , 1990, Proteins.

[9]  Hanfried Lenz,et al.  Design theory , 1985 .

[10]  Jeremy Buhler,et al.  Finding motifs using random projections , 2001, RECOMB.

[11]  M. M. Garner,et al.  A gel electrophoresis method for quantifying the binding of proteins to specific DNA regions: application to components of the Escherichia coli lactose operon regulatory system , 1981, Nucleic Acids Res..

[12]  George Varghese,et al.  A uniform projection method for motif discovery in DNA sequences , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  C. Elkan,et al.  Unsupervised learning of multiple motifs in biopolymers using expectation maximization , 1995, Machine Learning.

[14]  Chih-Ling Tsai,et al.  IMPROVING DIMENSION REDUCTION VIA CONTOUR-PROJECTION , 2008 .

[15]  Nadine Eberhardt,et al.  Constructions And Combinatorial Problems In Design Of Experiments , 2016 .

[16]  H. K. Dai,et al.  A survey of DNA motif finding algorithms , 2007, BMC Bioinformatics.

[17]  Harald Niederreiter,et al.  Random number generation and Quasi-Monte Carlo methods , 1992, CBMS-NSF regional conference series in applied mathematics.

[18]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[19]  D. Raghavarao Constructions and Combinatorial Problems in Design of Experiments , 1971 .

[20]  Martin Tompa,et al.  Computational motif discovery , 2005 .