Motif discovery in upstream sequences of coordinately expressed genes

The paper presents a genetic mining approach to discover highly conserved motifs amongst upstream sequences of co-regulated genes. These motifs represent putative cis-regulatory elements that could play an important role in the co-ordinated expression of these genes. A structured genetic algorithm (St-GA) was used to evolve candidate motifs of variable length. Fitness values were assigned as a function of high scoring alignments performed with NCBI BLAST. The St-GA performed favorable with respect to existing methods on simple (l,k) insertion problems, but was unable to overcome the (l,4) insertion problem that has proved elusive to other methods. Deterministic crowding was added to the St-GA to help cope with the multimodal nature of real-world genomic data. The genetic search was performed on a set of genes selected based on their expression values as highly predictive of a subtype of pediatric ALL. Four high scoring motifs were obtained that successfully matched subsequences of cis-elements found in the TRANSFAC database. Results demonstrated that the St-GA approach to motif finding has the potential to be a competitive method for this type of problem.

[1]  David Corne,et al.  Evolving core promoter signal motifs , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[2]  Uri Keich,et al.  Finding motifs in the twilight zone , 2002, RECOMB '02.

[3]  Uri Keich,et al.  Finding motifs in the twilight zone , 2002, Bioinform..

[4]  David E. Goldberg,et al.  A Genetic Algorithm for Parallel Simulated Annealing , 1992, PPSN.

[5]  Xin Yao,et al.  Automatic Discovery of Protein Motifs Using Genetic Programming , 2004 .

[6]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[7]  Thomas L. Madden,et al.  BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. , 1999, FEMS microbiology letters.

[8]  Jun S. Liu,et al.  An algorithm for finding protein–DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments , 2002, Nature Biotechnology.

[9]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.

[10]  D. Dasgupta,et al.  A MORE BIOLOGICALLY MOTIVATED GENETIC ALGORITHM: THE MODEL AND SOME RESULTS , 1994 .

[11]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[12]  Gary D. Stormo,et al.  Identification of consensus patterns in unaligned DNA sequences known to be functionally related , 1990, Comput. Appl. Biosci..

[13]  E. Davidson,et al.  The hardwiring of development: organization and function of genomic regulatory systems. , 1997, Development.

[14]  Xin Chen,et al.  TRANSFAC: an integrated system for gene expression regulation , 2000, Nucleic Acids Res..

[15]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[16]  Hao Li,et al.  Regulatory element detection using correlation with expression (abstract only) , 2001, RECOMB.

[17]  M. Ptashne How eukaryotic transcriptional activators work , 1988, Nature.

[18]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[19]  H. Bussemaker,et al.  Regulatory element detection using correlation with expression , 2001, Nature Genetics.