GAMID: using genetic algorithms for the inference of DNA motifs that are represented in only a subset of sequences of interest

In this work, we present GAMID, an extension of GAMI (Genetic Algorithms for Motif Inference), which allows the system to ignore some of the sequences when looking for candidate conserved motifs in noncoding DNA. This ability is useful both when looking for candidate motifs in co-expressed genes (where it is not expected that all genes respond to the same transcription factors) and when looking for candidate motifs in divergent species (where functional elements might appear only in related species). In these cases, we would like to allow the inferred motif to be present in only a subset of the input data. This paper provides background information about the problem, describes our approach, and presents results. By excluding some sequences from the match process, GAMID succeeds at finding known functional elements.

[1]  E. Kardami,et al.  Differential expression of human placental growth-hormone variant and chorionic somatomammotropin in culture. , 1990, The Biochemical journal.

[2]  J. Chou,et al.  Subtle differences in human pregnancy-specific glycoprotein gene promoters allow for differential expression. , 1994, The Journal of biological chemistry.

[3]  J. Bocco,et al.  Analyses of cis-acting and trans-acting elements that are crucial to sustain pregnancy-specific glycoprotein gene expression in different cell types. , 1996, European journal of biochemistry.

[4]  C. Kessler,et al.  Activator protein-2 regulates human placental lactogen gene expression , 2000, Molecular and Cellular Endocrinology.

[5]  I-Min A. Dubchak,et al.  Active conservation of noncoding sequences revealed by three-way species comparisons. , 2000, Genome research.

[6]  W. Liao,et al.  Transcription Factor AP-2 Functions as a Repressor That Contributes to the Liver-specific Expression of Serum Amyloid A1 Gene* , 2001, The Journal of Biological Chemistry.

[7]  David Corne,et al.  Evolving core promoter signal motifs , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[8]  R. Sibly,et al.  Discovering patterns in microsatellite flanks with evolutionary computation by evolving discriminatory DNA motifs , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[9]  S. Quake,et al.  Identification and confirmation of a module of coexpressed genes. , 2002, Genome research.

[10]  Nancy F. Hansen,et al.  Comparative analyses of multi-species sequences from targeted genomic regions , 2003, Nature.

[11]  Alexander E. Kel,et al.  TRANSFAC®: transcriptional regulation, from patterns to profiles , 2003, Nucleic Acids Res..

[12]  M. Gerstein,et al.  Of mice and men: phylogenetic footprinting aids the discovery of regulatory elements , 2003, Journal of biology.

[13]  Atul J. Butte,et al.  Quantifying the relationship between co-expression, co-regulation and gene function , 2004, BMC Bioinformatics.

[14]  Mathieu Blanchette,et al.  FootPrinter: a program designed for phylogenetic footprinting , 2003, Nucleic Acids Res..

[15]  S. Batzoglou,et al.  Quantitative estimates of sequence divergence for comparative analyses of mammalian genomes. , 2003, Genome research.

[16]  Jiashun Zheng,et al.  An approach to identify over-represented cis-elements in related sequences. , 2003, Nucleic acids research.

[17]  Eric C. Rouchka,et al.  Gibbs Recursive Sampler: finding transcription factor binding sites , 2003, Nucleic Acids Res..

[18]  G. Fogel,et al.  Discovery of sequence motifs related to coexpression of genes using evolutionary computation. , 2004, Nucleic acids research.

[19]  Z. Weng,et al.  Detection of functional DNA motifs via statistical over-representation. , 2004, Nucleic acids research.

[20]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[21]  Kathleen Marchal,et al.  More robust detection of motifs in coexpressed genes by using phylogenetic information , 2006, BMC Bioinformatics.

[22]  Thomas Werner,et al.  MatInspector and beyond: promoter analysis based on transcription factor binding sites , 2005, Bioinform..

[23]  David J. Arenillas,et al.  oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes , 2005, Nucleic acids research.

[24]  Andrew M. Tyrrell,et al.  The evolutionary computation approach to motif discovery in biological sequences , 2005, GECCO '05.

[25]  Carolyn J. Mattingly,et al.  Preliminary Results for GAMI: A Genetic Algorithms Approach to Motif Inference , 2005, 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[26]  Steven J. M. Jones,et al.  Locating mammalian transcription factor binding sites: a survey of computational and experimental techniques. , 2006, Genome research.

[27]  Wilfred W. Li,et al.  MEME: discovering and analyzing DNA and protein sequence motifs , 2006, Nucleic Acids Res..

[28]  Using PhyloCon to Identify Conserved Regulatory Motifs , 2007, Current protocols in bioinformatics.

[29]  K. Nakai,et al.  Weak correlation between sequence conservation in promoter regions and in protein-coding regions of human-mouse orthologous gene pairs , 2008, BMC Genomics.

[30]  J. Bocco,et al.  RXRalpha regulates the pregnancy-specific glycoprotein 5 gene transcription through a functional retinoic acid responsive element. , 2007, Placenta.

[31]  Kwong-Sak Leung,et al.  TFBS identification based on genetic algorithm with combined representations and adaptive post-processing , 2008, Bioinform..

[32]  Carolyn J. Mattingly,et al.  An Evaluation of Information Content as a Metric for the Inference of Putative Conserved Noncoding Regions in DNA Sequences Using a Genetic Algorithms Approach , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.