GENERATING NONTRIVIAL LONG-RANGE CORRELATIONS AND 1/f SPECTRA BY REPLICATION AND MUTATION

This paper aims at understanding the statistical features of nucleic acid sequences from the knowledge of the dynamical process that produces them. Two studies are carried out: first, mutual information function of the limiting sequences generated by simple sequence manipulation dynamics with replications and mutations are calculated numerically (sometimes analytically). It is shown that elongation and replication can easily produce long-range correlations. These long range correlations could be destroyed in various degrees by mutation in different sequence manipulation models. Second, mutual information functions for several human nucleic acids sequences are determined. It is observed that intron sequences (noncoding sequences) tend to have longer correlation lengths than exon sequences (protein-coding sequences).

[1]  Wentian Li,et al.  Mutual Information Functions of Natural Language Texts , 1989 .

[2]  Wentian Li,et al.  Long-range correlation and partial 1/fα spectrum in a noncoding DNA sequence , 1992 .

[3]  R. Britten,et al.  Repeated segments of DNA. , 1970, Scientific American.

[4]  S. Ohno Early genes that were oligomeric repeats generated a number of divergent domains on their own. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Wentian Li,et al.  Spatial 1/f spectra in open dynamical systems , 1989 .

[6]  K Nishikawa,et al.  Homology in protein sequences expressed by correlation coefficients. , 1981, Journal of theoretical biology.

[7]  Harold Marston Morse Recurrent geodesics on a surface of negative curvature , 1921 .

[8]  Harold C. Morris Typogenetics: A Logic for Artificial Life , 1987, ALIFE.

[9]  M. Eigen,et al.  Molecular quasi-species. , 1988 .

[10]  E. Trifonov,et al.  The pitch of chromatin DNA is reflected in its nucleotide sequence. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[11]  John von Neumann,et al.  Theory Of Self Reproducing Automata , 1967 .

[12]  D. Arquès,et al.  Periodicities in coding and noncoding regions of the genes. , 1990, Journal of theoretical biology.

[13]  J. Karn,et al.  Periodic features in the amino acid sequence of nematode myosin rod. , 1983, Journal of molecular biology.

[14]  Robert Savit,et al.  Structure factor of substitutional sequences , 1990 .

[15]  C J Michel,et al.  A purine-pyrimidine motif verifying an identical presence in almost all gene taxonomic groups. , 1987, Journal of theoretical biology.

[16]  Wentian Li,et al.  Transition phenomena in cellular automata rule space , 1991 .

[17]  Temple F. Smith The genetic code, information density, and evolution , 1969 .

[18]  R. J. Bagley The Functional Self-Organization of Autocatalytic Networks in a Model of the Evolution of Biogenesis. , 1991 .

[19]  Joseph Felsenstein,et al.  An efficient method for matching nucleic acid sequences , 1982, Nucleic Acids Res..

[20]  M. Bishop,et al.  Nucleic acid and protein sequence analysis : a practical approach , 1987 .

[21]  Cheng,et al.  Structure and electronic properties of Thue-Morse lattices. , 1988, Physical review. B, Condensed matter.

[22]  I B Dawid,et al.  Repeated genes in eukaryotes. , 1980, Annual review of biochemistry.

[23]  L. L. Gatlin,et al.  The information content of DNA. , 1966, Journal of theoretical biology.

[24]  A. Lindenmayer Mathematical models for cellular interactions in development. I. Filaments with one-sided inputs. , 1968, Journal of theoretical biology.

[25]  Grace Jordison Molecular Biology of the Gene , 1965, The Yale Journal of Biology and Medicine.

[26]  C. Peng,et al.  Long-range correlations in nucleotide sequences , 1992, Nature.

[27]  W. Jelinek,et al.  Repetitive sequences in eukaryotic DNA and their expression. , 1982, Annual review of biochemistry.

[28]  J. Kingman A FIRST COURSE IN STOCHASTIC PROCESSES , 1967 .

[29]  S. Wolfram Statistical mechanics of cellular automata , 1983 .

[30]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[31]  Wentian Li,et al.  ABSENCE OF 1/f SPECTRA IN DOW JONES DAILY AVERAGE , 1991 .

[32]  J. D. Watson The human genome project: past, present, and future. , 1990, Science.

[33]  Wentian Li Mutual information functions versus correlation functions , 1990 .

[34]  D Benton,et al.  GenBank: current status and future directions , 1990 .

[35]  R. Linsker,et al.  A measure of DNA periodicity. , 1986, Journal of theoretical biology.

[36]  S. Karlin,et al.  A second course in stochastic processes , 1981 .

[37]  J. Fickett Recognition of protein coding regions in DNA sequences. , 1982, Nucleic acids research.

[38]  David B. Searls,et al.  The computational linguistics of biological sequences , 1993, ISMB 1995.

[39]  M. Eigen,et al.  The Hypercycle: A principle of natural self-organization , 2009 .

[40]  C J Michel,et al.  A model of DNA sequence evolution. , 1990, Bulletin of mathematical biology.

[41]  M. Queffélec Substitution dynamical systems, spectral analysis , 1987 .

[42]  L. L. Gatlin,et al.  The information content of DNA. II. , 1968, Journal of theoretical biology.

[43]  David Dack Development systems , 1980, Microprocess. Microsystems.

[44]  Wentian Li Power Spectra of Regular Languages and Cellular Automata , 1987, Complex Syst..

[45]  D. Arquès,et al.  A model of DNA sequence evolution , 1990 .

[46]  W. Press Flicker noises in astronomy and elsewhere. , 1978 .

[47]  Li,et al.  Expansion-modification systems: A model for spatial 1/f spectra. , 1991, Physical review. A, Atomic, molecular, and optical physics.

[48]  Thomas M. Cover,et al.  A convergent gambling estimate of the entropy of English , 1978, IEEE Trans. Inf. Theory.

[49]  S. Kauffman,et al.  Autocatalytic replication of polymers , 1986 .

[50]  H. Nürnberg The Hypercycle. A Principle of Natural Self Organization. , 1981 .

[51]  Tommaso Toffoli,et al.  Cellular automata machines - a new environment for modeling , 1987, MIT Press series in scientific computation.

[52]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .