Reconstruction of DNA sequences using genetic algorithms and cellular automata: Towards mutation prediction?

Change of DNA sequence that fuels evolution is, to a certain extent, a deterministic process because mutagenesis does not occur in an absolutely random manner. So far, it has not been possible to decipher the rules that govern DNA sequence evolution due to the extreme complexity of the entire process. In our attempt to approach this issue we focus solely on the mechanisms of mutagenesis and deliberately disregard the role of natural selection. Hence, in this analysis, evolution refers to the accumulation of genetic alterations that originate from mutations and are transmitted through generations without being subjected to natural selection. We have developed a software tool that allows modelling of a DNA sequence as a one-dimensional cellular automaton (CA) with four states per cell which correspond to the four DNA bases, i.e. A, C, T and G. The four states are represented by numbers of the quaternary number system. Moreover, we have developed genetic algorithms (GAs) in order to determine the rules of CA evolution that simulate the DNA evolution process. Linear evolution rules were considered and square matrices were used to represent them. If DNA sequences of different evolution steps are available, our approach allows the determination of the underlying evolution rule(s). Conversely, once the evolution rules are deciphered, our tool may reconstruct the DNA sequence in any previous evolution step for which the exact sequence information was unknown. The developed tool may be used to test various parameters that could influence evolution. We describe a paradigm relying on the assumption that mutagenesis is governed by a near-neighbour-dependent mechanism. Based on the satisfactory performance of our system in the deliberately simplified example, we propose that our approach could offer a starting point for future attempts to understand the mechanisms that govern evolution. The developed software is open-source and has a user-friendly graphical input interface.

[1]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[2]  Michael Baake,et al.  Quantum mechanics versus classical probability in biological evolution , 1998 .

[3]  Georgios Ch. Sirakoulis,et al.  A cellular automaton model for the study of DNA sequence evolution , 2003, Comput. Biol. Medicine.

[4]  Zhen-De Huang,et al.  A novel fingerprint map for detecting SARS-CoV , 2005, Journal of Pharmaceutical and Biomedical Analysis.

[5]  Ioannis G. Karafyllidis,et al.  A model for the prediction of oil slick movement and spreading using cellular automata , 1997 .

[6]  Christopher B. Burge,et al.  DNA sequence evolution with neighbor-dependent mutation , 2001, RECOMB '02.

[7]  Brent B Welch,et al.  Practical Programming in Tcl and Tk , 1999 .

[8]  K. Chou,et al.  Progress in computational approach to drug development against SARS. , 2006, Current medicinal chemistry.

[9]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[10]  K.-C. Chou,et al.  Using cellular automata to generate image representation for biological sequences , 2005, Amino Acids.

[11]  Christopher B. Burge,et al.  DNA Sequence Evolution with Neighbor-Dependent Mutation , 2003, J. Comput. Biol..

[12]  Hiroki Sayama,et al.  Evolutionary dynamics of cellular automata-based self-replicators in hostile environments. , 2004, Bio Systems.

[13]  Alex A. Freitas,et al.  Evolutionary Computation , 2002 .

[14]  E. Bieberich,et al.  Probing quantum coherence in a biological system by means of DNA amplification. , 2000, Bio Systems.

[15]  Melanie Mitchell,et al.  Evolving cellular automata to perform computations: mechanisms and impediments , 1994 .

[16]  J. McFadden,et al.  A quantum mechanical model of adaptive mutation. , 1999, Bio Systems.

[17]  Eugene S. Kryachko,et al.  The origin of spontaneous point mutations in DNA via Löwdin mechanism of proton tunneling in DNA base pairs: Cure with covalent base pairing* , 2002 .

[18]  G. Stormo Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Richard Durbin , Sean R. Eddy , Anders Krogh , Graeme Mitchison , 2000 .

[19]  S. D. Gregorio,et al.  Applying genetic algorithms for calibrating a hexagonal cellular automata model for the simulation of debris flows characterised by strong inertial effects , 2005 .

[20]  Stephen A. Billings,et al.  The Identification of Cellular Automata , 2006, J. Cell. Autom..

[21]  H. Schwefel Deep insight from simple models of evolution. , 2002, Bio Systems.

[22]  G. Sirakoulis,et al.  A cellular automaton model for the effects of population movement and vaccination on epidemic propagation , 2000 .

[23]  S Torquato,et al.  Simulated brain tumor growth dynamics using a three-dimensional cellular automaton. , 2000, Journal of theoretical biology.

[24]  Kuo-Chen Chou,et al.  A probability cellular automaton model for hepatitis B viral infections. , 2006, Biochemical and biophysical research communications.

[25]  John von Neumann,et al.  Theory Of Self Reproducing Automata , 1967 .

[26]  Meng Wang,et al.  A new nucleotide-composition based fingerprint of SARS-CoV with visualization analysis. , 2005, Medicinal chemistry (Shariqah (United Arab Emirates)).

[27]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[28]  K C Chou,et al.  Graphic analysis of codon usage strategy in 1490 human proteins , 1993, Journal of protein chemistry.

[29]  James P. Crutchfield,et al.  A Genetic Algorithm Discovers Particle-Based Computation in Cellular Automata , 1994, PPSN.

[30]  M E Jones,et al.  The application of a linear algebra to the analysis of mutation rates. , 1999, Journal of theoretical biology.

[31]  V. Ogryzko,et al.  A quantum-theoretical approach to the phenomenon of directed mutations in bacteria (hypothesis). , 1997, Bio Systems.

[32]  S M Ulam,et al.  Some ideas and prospects in biomathematics. , 1972, Annual review of biophysics and bioengineering.

[33]  G. Sirakoulis,et al.  An algorithm for the study of DNA sequence evolution based on the genetic code. , 2004, Bio Systems.

[34]  Ioannis Karafyllidis,et al.  Design of a dedicated parallel processor for the prediction of forest fire spreading using cellular automata and genetic algorithms , 2004, Eng. Appl. Artif. Intell..

[35]  M. Baake,et al.  Ising quantum chain is equivalent to a model of biological evolution , 1997 .

[36]  C. Zhang,et al.  A graphic approach to analyzing codon usage in 1562 Escherichia coli protein coding sequences. , 1994, Journal of molecular biology.

[37]  Ioannis G. Karafyllidis,et al.  A model for predicting forest fire spreading using cellular automata , 1997 .

[38]  E. T. Gawlinski,et al.  A Cellular Automaton Model of Early Tumor Growth and Invasion: The Effects of Native Tissue Vascularity and Increased Anaerobic Tumor Metabolism , 2001 .

[39]  I Karafyllidis,et al.  A model for the influence of the greenhouse effect on insect and microorganism geographical distribution and population dynamics. , 1998, Bio Systems.

[40]  James P. Crutchfield,et al.  Evolving cellular automata to perform computations , 1997 .

[41]  R. Sanjuán,et al.  The cost of replication fidelity in an RNA virus. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Kevin G Kirby,et al.  Biological adaptabilities and quantum entropies. , 2002, Bio Systems.

[43]  S. Salzberg,et al.  Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution , 2005, Nature.

[44]  A. Bernardes,et al.  Immune network at the edge of chaos. , 1997, Journal of theoretical biology.

[45]  T. Speed,et al.  Biological Sequence Analysis , 1998 .

[46]  M Hjort,et al.  Band resonant tunneling in DNA molecules. , 2001, Physical review letters.

[47]  Andreas D. Baxevanis,et al.  Bioinformatics - a practical guide to the analysis of genes and proteins , 2001, Methods of biochemical analysis.

[48]  Z. Huang,et al.  Using cellular automata images and pseudo amino acid composition to predict protein subcellular location , 2005, Amino Acids.

[49]  Richard J. Gaylord,et al.  Modeling Nature: Cellular Automata Simulations with Mathematica® , 1996 .

[50]  Yongsheng Ding,et al.  An application of gene comparative image for predicting the effect on replication ratio by HBV virus gene missense mutation. , 2005, Journal of theoretical biology.

[51]  Stephen A. Billings,et al.  Neighborhood detection and rule selection from cellular automata patterns , 2000, IEEE Trans. Syst. Man Cybern. Part A.

[52]  Y. Gaididei,et al.  Interplay of nonlinearity and geometry in a DNA-related, Klein-Gordon model with long-range dipole-dipole interaction. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[53]  Yu. S. Volkov,et al.  Tautomeric Transitions in DNA , 2001, cond-mat/0110599.

[54]  Krzysztof J. Cios,et al.  Computational intelligence in solving bioinformatics problems , 2005, Artif. Intell. Medicine.

[55]  C. Zhang,et al.  Diagrammatization of codon usage in 339 human immunodeficiency virus proteins and its biological implication. , 1992, AIDS research and human retroviruses.