CE3 Customizable and Easily Extensible Ensemble Tool for Motif Discovery

Ensemble methods (or simply ensembles) for motif discovery represent a relatively new approach to improve the accuracy of stand- alone motif finders. In particular, the accuracy of an ensemble is deter- mined by the included finders and the strategy (learning rule) used to combine the results returned by the latter, making these choices crucial for the ensemble success. In this research we propose a general archi- tecture for ensembles, called CE 3 , which is meant to be extensible and customizable for what concerns external tools inclusion and learning rule. Using CE 3 the user will be able to "simulate" existing ensembles and possibly incorporate newly proposed tools (and learning functions) with the aim at improving the ensemble's prediction accuracy. Preliminary experiments performed with a prototype implementation of CE 3 led to interesting insights and a critical analysis of the potentials and limita- tions of currently available ensembles.

[1]  Siu-Ming Yiu,et al.  Detection of generic spaced motifs using submotif pattern mining , 2007, Bioinform..

[2]  Kathleen Marchal,et al.  A Gibbs sampling method to detect over-represented motifs in the upstream regions of co-expressed genes , 2001, RECOMB.

[3]  W. Marsden I and J , 2012 .

[4]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[5]  Finn Drabløs,et al.  Improved benchmarks for computational motif discovery , 2007, BMC Bioinformatics.

[6]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[7]  Daisuke Kihara,et al.  EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequences , 2006, BMC Bioinformatics.

[8]  Siu-Ming Yiu,et al.  MotifVoter: a novel ensemble method for fine-grained integration of generic motif finders , 2008, Bioinform..

[9]  Martha L. Bulyk,et al.  Meta-analysis discovery of tissue-specific DNA sequence motifs from mammalian gene expression data , 2006, BMC Bioinformatics.

[10]  H. K. Dai,et al.  A survey of DNA motif finding algorithms , 2007, BMC Bioinformatics.

[11]  Francisco-Javier Lopez,et al.  FISim: A new similarity measure between transcription factor binding sites based on the fuzzy integral , 2009, BMC Bioinformatics.

[12]  W. J. Kent,et al.  Environmentally Induced Foregut Remodeling by PHA-4/FoxA and DAF-12/NHR , 2004, Science.

[13]  Jacques van Helden,et al.  Regulatory Sequence Analysis Tools , 2003, Nucleic Acids Res..

[14]  Graziano Pesole,et al.  An algorithm for finding signals of unknown length in DNA sequences , 2001, ISMB.

[15]  Charles Elkan,et al.  The Value of Prior Knowledge in Discovering Motifs with MEME , 1995, ISMB.

[16]  Nak-Kyeong Kim,et al.  Finding sequence motifs with Bayesian models incorporating positional information: an application to transcription factor binding sites , 2008, BMC Bioinformatics.

[17]  Mohamed A. Ismail,et al.  MProfiler: A Profile-Based Method for DNA Motif Discovery , 2009, PRIB.

[18]  Graziano Pesole,et al.  Motif discovery and transcription factor binding sites before and after the next-generation sequencing era , 2012, Briefings Bioinform..

[19]  Robert H. Gross,et al.  A novel ensemble learning method for de novo computational identification of DNA binding sites , 2007, BMC Bioinformatics.

[20]  Douglas L. Brutlag,et al.  BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes , 2000, Pacific Symposium on Biocomputing.

[21]  Jun S. Liu,et al.  An algorithm for finding protein–DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments , 2002, Nature Biotechnology.

[22]  Patrick Ng,et al.  GIMSAN: a Gibbs motif finder with significance analysis , 2008, Bioinform..

[23]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[24]  Kathleen Marchal,et al.  A Gibbs sampling method to detect over-represented motifs in the upstream regions of co-expressed genes , 2001, RECOMB.

[25]  Mona Singh,et al.  Comparative analysis of methods for representing and searching for transcription factor binding sites , 2004, Bioinform..

[26]  Simon J. van Heeringen,et al.  GimmeMotifs: a de novo motif prediction pipeline for ChIP-sequencing experiments , 2010, Bioinform..

[27]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[28]  Denis Thieffry,et al.  RSAT 2011: regulatory sequence analysis tools , 2011, Nucleic Acids Res..

[29]  P. D’haeseleer What are DNA sequence motifs? , 2006, Nature Biotechnology.