Preliminary Results for GAMI: A Genetic Algorithms Approach to Motif Inference

We have developed GAMI, an approach to motif inference that uses a genetic algorithms search and is designed specifically to work with divergent species and possibly long nucleotide sequences. The system design reduces the size of the search space as compared to typical window-location approaches for motif inference. This paper describes the motivation and system design for GAMI, discusses how we have designed the search space and compares this to the search space of other approaches, and presents initial results with data from the literature and from novel tasks. GAMI is able to find a host of putative conserved patterns; possible approaches for validating the utility of the conserved regions are discussed.

[1]  L. Pennacchio,et al.  Comparative genomic tools and databases: providing insights into the human genome. , 2003, The Journal of clinical investigation.

[2]  Ilme Schlichting,et al.  Structure and chemistry of cytochrome P450. , 2005, Chemical reviews.

[3]  Axel Meyer,et al.  Evolutionary conservation of regulatory elements in vertebrate Hox gene clusters. , 2003, Genome research.

[4]  S. Cole,et al.  Toxicological relevance of the multidrug resistance protein 1, MRP1 (ABCC1) and related transporters. , 2001, Toxicology.

[5]  Andrew M. Tyrrell,et al.  The evolutionary computation approach to motif discovery in biological sequences , 2005, GECCO '05.

[6]  Alexander E. Kel,et al.  TRANSFAC®: transcriptional regulation, from patterns to profiles , 2003, Nucleic Acids Res..

[7]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[8]  G. Fogel,et al.  Discovery of sequence motifs related to coexpression of genes using evolutionary computation. , 2004, Nucleic acids research.

[9]  Eric C. Rouchka,et al.  Gibbs Recursive Sampler: finding transcription factor binding sites , 2003, Nucleic Acids Res..

[10]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[11]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[12]  A. Rzhetsky,et al.  The human ATP-binding cassette (ABC) transporter superfamily. , 2001, Genome research.

[13]  C. Higgins,et al.  ABC transporters: from microorganisms to man. , 1992, Annual review of cell biology.

[14]  I-Min A. Dubchak,et al.  Active conservation of noncoding sequences revealed by three-way species comparisons. , 2000, Genome research.

[15]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[16]  Lawrence. Davis,et al.  Handbook Of Genetic Algorithms , 1990 .

[17]  Paramvir S. Dehal,et al.  Whole-Genome Shotgun Assembly and Analysis of the Genome of Fugu rubripes , 2002, Science.

[18]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[19]  F Peter Guengerich,et al.  Cytochrome P450: What Have We Learned and What Are the Future Issues? , 2004, Drug metabolism reviews.

[20]  J. Touchman,et al.  Vertebrate genome sequencing: building a backbone for comparative genomics. , 2002, Trends in genetics : TIG.

[21]  A. Daly,et al.  Pharmacogenetics of the cytochromes P450. , 2004, Current topics in medicinal chemistry.

[22]  Magnus Ingelman-Sundberg,et al.  Polymorphism of cytochrome P450 and xenobiotic toxicity. , 2002, Toxicology.

[23]  David Corne,et al.  Evolving core promoter signal motifs , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[24]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[25]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[26]  W. Miller,et al.  Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. , 2000, Science.

[27]  David Y Cooper,et al.  Personal Remembrances of Episodes in the Life of Herbert Remmer , 2004, Drug metabolism reviews.

[28]  Nancy F. Hansen,et al.  Comparative analyses of multi-species sequences from targeted genomic regions , 2003, Nature.

[29]  S. Brenner,et al.  Detecting conserved regulatory elements with the model genome of the Japanese puffer fish, Fugu rubripes. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[30]  R. Sibly,et al.  Discovering patterns in microsatellite flanks with evolutionary computation by evolving discriminatory DNA motifs , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[31]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.