IEM: an algorithm for iterative enhancement of motifs using comparative genomics data.

Understanding gene regulation is a key step to investigating gene functions and their relationships. Many algorithms have been developed to discover transcription factor binding sites (TFBS); they are predominantly located in upstream regions of genes and contribute to transcription regulation if they are bound by a specific transcription factor. However, traditional methods focusing on finding motifs have shortcomings, which can be overcome by using comparative genomics data that is now increasingly available. Traditional methods to score motifs also have their limitations. In this paper, we propose a new algorithm called IEM to refine motifs using comparative genomics data. We show the effectiveness of our techniques with several data sets. Two sets of experiments were performed with comparative genomics data on five strains of P. aeruginosa. One set of experiments were performed with similar data on four species of yeast. The weighted conservation score proposed in this paper is an improvement over existing motif scores.

[1]  Mathieu Blanchette,et al.  PhyME: A probabilistic algorithm for finding motifs in sets of orthologous sequences , 2004, BMC Bioinformatics.

[2]  A. Abdelal,et al.  The Arginine Regulatory Protein Mediates Repression by Arginine of the Operons Encoding Glutamate Synthase and Anabolic Glutamate Dehydrogenase in Pseudomonas aeruginosa , 2004, Journal of bacteriology.

[3]  A. A. Reilly,et al.  An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences , 1990, Proteins.

[4]  Seung-moon Park,et al.  Cloning and Characterization of argR , a Gene That Participates in Regulation of Arginine Biosynthesis and Catabolism in Pseudomonas aeruginosa PAO 1 , 1997 .

[5]  Erik van Nimwegen,et al.  PhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates Phylogeny , 2005, PLoS Comput. Biol..

[6]  Jason Gertz,et al.  Discovery, validation, and genetic dissection of transcription factor binding sites by comparative and functional genomics. , 2005, Genome research.

[7]  H. Hartley Maximum Likelihood Estimation from Incomplete Data , 1958 .

[8]  Graziano Pesole,et al.  Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes , 2004, Nucleic Acids Res..

[9]  A. Abdelal,et al.  The gdhB Gene of Pseudomonas aeruginosaEncodes an Arginine-Inducible NAD+-Dependent Glutamate Dehydrogenase Which Is Subject to Allosteric Regulation , 2001, Journal of bacteriology.

[10]  T. Werner Models for prediction and recognition of eukaryotic promoters , 1999, Mammalian Genome.

[11]  Saurabh Sinha,et al.  YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation , 2003, Nucleic Acids Res..

[12]  Wei Li,et al.  Transcriptome Analysis of the ArgR Regulon in Pseudomonas aeruginosa , 2004, Journal of bacteriology.

[13]  Ernest Fraenkel,et al.  Practical Strategies for Discovering Regulatory DNA Sequence Motifs , 2006, PLoS Comput. Biol..

[14]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[15]  Bin Li,et al.  Limitations and potentials of current motif discovery algorithms , 2005, Nucleic acids research.

[16]  Ting Wang,et al.  Combining phylogenetic data with co-regulated genes to identify regulatory motifs , 2003, Bioinform..

[17]  Matteo Comin,et al.  Subtle Motif Discovery for Detection of DNA Regulatory Sites , 2007, APBC.

[18]  Giri Narasimhan,et al.  Enhancing Motif Refinement by Incorporating Comparative Genomics Data , 2007, ISBRA.

[19]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[20]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[21]  G. K. Sandve,et al.  A survey of motif discovery methods in an integrated framework , 2006, Biology Direct.

[22]  Serafim Batzoglou,et al.  Eukaryotic regulatory element conservation analysis and identification using comparative genomics. , 2004, Genome research.

[23]  D. Guhathakurta,et al.  Computational identification of transcriptional regulatory elements in DNA sequence , 2006, Nucleic acids research.

[24]  Y. Itoh Cloning and characterization of the aru genes encoding enzymes of the catabolic arginine succinyltransferase pathway in Pseudomonas aeruginosa , 1997, Journal of bacteriology.

[25]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[26]  B. Birren,et al.  Sequencing and comparison of yeast species to identify genes and regulatory elements , 2003, Nature.

[27]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[28]  A. Abdelal,et al.  Cloning and characterization of argR, a gene that participates in regulation of arginine biosynthesis and catabolism in Pseudomonas aeruginosa PAO1 , 1997, Journal of bacteriology.

[29]  Mathieu Blanchette,et al.  Motif Discovery in Heterogeneous Sequence Data , 2003, Pacific Symposium on Biocomputing.

[30]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[31]  Douglas L. Brutlag,et al.  BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes , 2000, Pacific Symposium on Biocomputing.

[32]  Jun S. Liu,et al.  An algorithm for finding protein–DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments , 2002, Nature Biotechnology.

[33]  C. Koh,et al.  Pseudomonas aeruginosa AmpR Is a Global Transcriptional Factor That Regulates Expression of AmpC and PoxB β-Lactamases, Proteases, Quorum Sensing, and Other Virulence Factors , 2005, Antimicrobial Agents and Chemotherapy.

[34]  M. Blanchette,et al.  Discovery of regulatory elements by a computational method for phylogenetic footprinting. , 2002, Genome research.

[35]  Seung-moon Park,et al.  Molecular Characterization and Regulation of an Operon Encoding a System for Transport of Arginine and Ornithine and the ArgR Regulatory Protein in Pseudomonas aeruginosa , 1998, Journal of bacteriology.

[36]  Michael B. Eisen,et al.  Phylogenetic Motif Detection by Expectation-Maximization on Evolutionary Mixtures , 2003, Pacific Symposium on Biocomputing.

[37]  D. Haas,et al.  The ArgR Regulatory Protein, a Helper to the Anaerobic Regulator ANR during Transcriptional Activation of thearcD Promoter in Pseudomonas aeruginosa , 1999, Journal of bacteriology.