Genetic algorithm for dyad pattern finding in DNA sequences.

In this paper a novel genetic algorithm is presented for the dyad motif finding problem. The genetic algorithm uses a multi-objective fitness function based on the sum of pairs, the number of matches, and the information content. The individuals required for the population pool in the genetic algorithm are optimized by Gibbs sampling method. Also, new crossover and mutation operators are designed. The algorithm is implemented and tested on the different types of real datasets. The results are compared with other well-known algorithms and the effectiveness of our algorithm is shown.

[1]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[2]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[3]  Charles Elkan,et al.  Unsupervised learning of multiple motifs in biopolymers using expectation maximization , 1995, Mach. Learn..

[4]  D. Higgins,et al.  Finding flexible patterns in unaligned protein sequences , 1995, Protein science : a publication of the Protein Society.

[5]  Aris Floratos,et al.  Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm [published erratum appears in Bioinformatics 1998;14(2): 229] , 1998, Bioinform..

[6]  E. Davidson,et al.  Genomic cis-regulatory logic: experimental and computational analysis of a sea urchin gene. , 1998, Science.

[7]  Michael Q. Zhang,et al.  SCPD: a promoter database of the yeast Saccharomyces cerevisiae , 1999, Bioinform..

[8]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[9]  Pavel A. Pevzner,et al.  Computational molecular biology : an algorithmic approach , 2000 .

[10]  J. Collado-Vides,et al.  Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. , 2000, Nucleic acids research.

[11]  Pavel A. Pevzner,et al.  Combinatorial Approaches to Finding Subtle Signals in DNA Sequences , 2000, ISMB.

[12]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[13]  E. Koonin,et al.  Prediction of transcription regulatory sites in Archaea by a comparative genomic approach. , 2000, Nucleic acids research.

[14]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[15]  Marie-France Sagot,et al.  Algorithms for Extracting Structured Motifs Using a Suffix Tree with an Application to Promoter and Regulatory Site Consensus Identification , 2000, J. Comput. Biol..

[16]  Gary D. Stormo,et al.  Identifying target sites for cooperatively binding factors , 2001, Bioinform..

[17]  David Martin,et al.  Computational Molecular Biology: An Algorithmic Approach , 2001 .

[18]  Kathleen Marchal,et al.  A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling , 2001, Bioinform..

[19]  Douglas L. Brutlag,et al.  BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes , 2000, Pacific Symposium on Biocomputing.

[20]  Graziano Pesole,et al.  An algorithm for finding signals of unknown length in DNA sequences , 2001, ISMB.

[21]  Eleazar Eskin,et al.  Finding composite regulatory patterns in DNA sequences , 2002, ISMB.

[22]  Saurabh Sinha,et al.  YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation , 2003, Nucleic Acids Res..

[23]  Dipankar Dasgupta,et al.  Motif discovery in upstream sequences of coordinately expressed genes , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[24]  Saurabh Sinha,et al.  Performance comparison of algorithms for finding transcription factor binding sites , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[25]  G. Fogel,et al.  Discovery of sequence motifs related to coexpression of genes using evolutionary computation. , 2004, Nucleic acids research.

[26]  Rong-Ming Chen,et al.  FMGA: finding motifs by genetic algorithm , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.

[27]  Jason Gertz,et al.  Discovery, validation, and genetic dissection of transcription factor binding sites by comparative and functional genomics. , 2005, Genome research.

[28]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[29]  Khaled Rasheed,et al.  MDGA: motif discovery using a genetic algorithm , 2005, GECCO '05.

[30]  Hitoshi Iba,et al.  Identification of weak motifs in multiple biological sequences using genetic algorithm , 2006, GECCO.

[31]  G. K. Sandve,et al.  A survey of motif discovery methods in an integrated framework , 2006, Biology Direct.

[32]  Zhi Wei,et al.  GAME: detecting cis-regulatory elements using a genetic algorithm , 2006, Bioinform..

[33]  Siu-Ming Yiu,et al.  Detection of generic spaced motifs using submotif pattern mining , 2007, Bioinform..

[34]  H. K. Dai,et al.  A survey of DNA motif finding algorithms , 2007, BMC Bioinformatics.

[35]  Finn Drabløs,et al.  Assessment of composite motif discovery methods , 2008, BMC Bioinformatics.

[36]  Allegra Via,et al.  FunClust: a web server for the identification of structural motifs in a set of non-homologous protein structures , 2008, BMC Bioinformatics.