A Parallel, Distributed-Memory Framework for Comparative Motif Discovery

The increasing number of sequenced organisms has opened new possibilities for the computational discovery of cis-regulatory elements (‘motifs’) based on phylogenetic footprinting. Word-based, exhaustive approaches are among the best performing algorithms, however, they pose significant computational challenges as the number of candidate motifs to evaluate is very high. In this contribution, we describe a parallel, distributed-memory framework for de novo comparative motif discovery. Within this framework, two approaches for phylogenetic footprinting are implemented: an alignment-based and an alignment-free method. The framework is able to statistically evaluate the conservation of motifs in a search space containing over 160 million candidate motifs using a distributed-memory cluster with 200 CPU cores in a few hours. Software available from http://bioinformatics.intec.ugent.be/blsspeller/

[1]  K. Lindblad-Toh,et al.  Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals , 2005, Nature.

[2]  Olivier Elemento,et al.  Fast and systematic genome-wide discovery of conserved regulatory elements using a non-alignment based approach , 2005, Genome Biology.

[3]  Li-Jun Ma,et al.  Systematic discovery of regulatory motifs in Fusarium graminearum by comparing four Fusarium genomes , 2010, BMC Genomics.

[4]  Colin N. Dewey,et al.  Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures , 2007, Nature.

[5]  Sven Rahmann,et al.  Efficient exact motif discovery , 2009, Bioinform..

[6]  Marie-France Sagot,et al.  Algorithms for Extracting Structured Motifs Using a Suffix Tree with an Application to Promoter and Regulatory Site Consensus Identification , 2000, J. Comput. Biol..

[7]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[8]  Y. van de Peer,et al.  Dissecting Plant Genomes with the PLAZA Comparative Genomics Platform1[W] , 2011, Plant Physiology.

[9]  H. K. Dai,et al.  A survey of DNA motif finding algorithms , 2007, BMC Bioinformatics.

[10]  Jie Wu,et al.  Discovering regulatory motifs in the Plasmodium genome using comparative genomics , 2008, Bioinform..

[11]  M. Blanchette,et al.  Discovery of regulatory elements by a computational method for phylogenetic footprinting. , 2002, Genome research.

[12]  Robert Giegerich,et al.  Efficient implementation of lazy suffix trees , 2003, Softw. Pract. Exp..

[13]  Karyn Megy,et al.  Comparative genomics allows the discovery of cis-regulatory elements in mosquitoes , 2009, Proceedings of the National Academy of Sciences.

[14]  Y. van de Peer,et al.  PLAZA: A Comparative Genomics Resource to Study Gene and Genome Evolution in Plants[W] , 2009, The Plant Cell Online.

[15]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[16]  Benedict Paten,et al.  The discovery, positioning and verification of a set of transcription-associated motifs in vertebrates , 2005, Genome Biology.