MProfiler: A Profile-Based Method for DNA Motif Discovery

Motif Finding is one of the most important tasks in gene regulation which is essential in understanding biological cell functions. Based on recent studies, the performance of current motif finders is not satisfactory. A number of ensemble methods have been proposed to enhance the accuracy of the results. Existing ensemble methods overall performance is better than stand-alone motif finders. A recent ensemble method, MotifVoter, significantly outperforms all existing stand-alone and ensemble methods. In this paper, we propose a method, MProfiler, to increase the accuracy of MotifVoter without increasing the run time by introducing an idea called center profiling. Our experiments show improvement in the quality of generated clusters over MotifVoter in both accuracy and cluster compactness. Using 56 datasets, the accuracy of the final results using our method achieves 80% improvement in correlation coefficient nCC , and 93% improvement in performance coefficient nPC over MotifVoter.

[1]  Bin Li,et al.  Limitations and potentials of current motif discovery algorithms , 2005, Nucleic acids research.

[2]  Kathleen Marchal,et al.  A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling , 2001, Bioinform..

[3]  Robert H. Gross,et al.  A novel ensemble learning method for de novo computational identification of DNA binding sites , 2007, BMC Bioinformatics.

[4]  Siu-Ming Yiu,et al.  MotifVoter: a novel ensemble method for fine-grained integration of generic motif finders , 2008, Bioinform..

[5]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[6]  P. Qiu Recent advances in computational promoter analysis in understanding the transcriptional regulatory network. , 2003, Biochemical and biophysical research communications.

[7]  Douglas L. Brutlag,et al.  BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes , 2000, Pacific Symposium on Biocomputing.

[8]  Jun S. Liu,et al.  An algorithm for finding protein–DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments , 2002, Nature Biotechnology.

[9]  Liming Cai,et al.  BEST: Binding-site Estimation Suite of Tools , 2005, Bioinform..

[10]  Charles Elkan,et al.  Unsupervised learning of multiple motifs in biopolymers using expectation maximization , 1995, Mach. Learn..

[11]  Siu-Ming Yiu,et al.  Detection of generic spaced motifs using submotif pattern mining , 2007, Bioinform..

[12]  H. K. Dai,et al.  A survey of DNA motif finding algorithms , 2007, BMC Bioinformatics.

[13]  Hsien-Da Huang,et al.  Identifying transcriptional regulatory sites in the human genome using an integrated system. , 2004, Nucleic acids research.

[14]  Graziano Pesole,et al.  Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes , 2004, Nucleic Acids Res..

[15]  Daisuke Kihara,et al.  EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequences , 2006, BMC Bioinformatics.

[16]  Nan Li,et al.  Analysis of computational approaches for motif discovery , 2006, Algorithms for Molecular Biology.

[17]  Eleazar Eskin,et al.  Finding composite regulatory patterns in DNA sequences , 2002, ISMB.

[18]  G. Stormo,et al.  ANN-Spec: a method for discovering transcription factor binding sites with improved specificity. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[19]  W. J. Kent,et al.  Environmentally Induced Foregut Remodeling by PHA-4/FoxA and DAF-12/NHR , 2004, Science.

[20]  Wei Wei,et al.  Comparative Analysis of Regulatory Motif Discovery Tools for Transcription Factor Binding Sites , 2007, Genom. Proteom. Bioinform..

[21]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[22]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.