Combining comparative genomics with de novo motif discovery to identify human transcription factor DNA-binding motifs

BackgroundAs more and more genomes are sequenced, comparative genomics approaches provide a methodology for identifying conserved regulatory elements that may be involved in gene regulations.ResultsWe developed a novel method to combine comparative genomics with de novo motif discovery to identify human transcription factor binding motifs that are overrepresented and conserved in the upstream regions of a set of co-regulated genes. The method is validated by analyzing a well-characterized muscle specific gene set, and the results showed that our approach performed better than the existing programs in terms of sensitivity and prediction rate.ConclusionThe newly developed method can be used to extract regulatory signals in co-regulated genes, which can be derived from the microarray clustering analysis.

[1]  Mathieu Blanchette,et al.  PhyME: A probabilistic algorithm for finding motifs in sets of orthologous sequences , 2004, BMC Bioinformatics.

[2]  D. Haussler,et al.  Aligning multiple genomic sequences with the threaded blockset aligner. , 2004, Genome research.

[3]  Jason Gertz,et al.  Discovery, validation, and genetic dissection of transcription factor binding sites by comparative and functional genomics. , 2005, Genome research.

[4]  Mathieu Blanchette,et al.  Motif Discovery in Heterogeneous Sequence Data , 2003, Pacific Symposium on Biocomputing.

[5]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[6]  C. Lawrence,et al.  Human-mouse genome comparisons to locate regulatory sites , 2000, Nature Genetics.

[7]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[8]  S. Eddy A Model of the Statistical Power of Comparative Genome Sequence Analysis , 2005, PLoS biology.

[9]  K. Lindblad-Toh,et al.  Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals , 2005, Nature.

[10]  B. Birren,et al.  Sequencing and comparison of yeast species to identify genes and regulatory elements , 2003, Nature.

[11]  Serafim Batzoglou,et al.  Eukaryotic regulatory element conservation analysis and identification using comparative genomics. , 2004, Genome research.

[12]  Wing H Wong,et al.  Sampling motifs on phylogenetic trees. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Bart De Moor,et al.  TOUCAN 2: the all-inclusive open source workbench for regulatory sequence analysis , 2005, Nucleic Acids Res..

[14]  Graziano Pesole,et al.  Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes , 2004, Nucleic Acids Res..

[15]  Graziano Pesole,et al.  An algorithm for finding signals of unknown length in DNA sequences , 2001, ISMB.

[16]  B. De Moor,et al.  Toucan: deciphering the cis-regulatory logic of coregulated genes. , 2003, Nucleic acids research.

[17]  Bin Li,et al.  Limitations and potentials of current motif discovery algorithms , 2005, Nucleic acids research.

[18]  Ting Wang,et al.  Combining phylogenetic data with co-regulated genes to identify regulatory motifs , 2003, Bioinform..