Algorithms for finding regulatory motifs in dna sequences

This dissertation considers the problem of discovering regulatory motifs in DNA sequences. Solving this problem provides significant clues in unraveling the genetic interactions that are required for the proper functioning of a cell. As a preliminary step, known instances of regulatory motifs, as mentioned in a rich biological literature, are collected, and a motif model that captures most of these instances is developed. The computational problem of detecting significant motifs, belonging to this model, that are present in input sequences, is then formalized, and efficient algorithms are proposed to solve this problem. Two such algorithms, YMF and DMOTIFS, are developed, with different criteria for assessing the statistical significance of motif occurrences. Both algorithms use an enumerative approach that finds the best motifs by the chosen criteria. A third algorithm, FINDEXPLANATORS, is designed to improve the performance of a certain class of motif-finding algorithms (including YMF and DMOTIFS) by removing redundancies from their output. The performance of the proposed algorithms is evaluated on synthetic as well as real data sets, and the results are very encouraging. The YMF-FINDEXPLANATORS suite is shown to perform better than two state-of-the-art motif-finding algorithms with which it was compared on data sets from the yeast genome. Several significant novel motifs detected by our programs are reported, and are good subjects for investigation of regulatory function by biological experiments.