Information decomposition method to analyze symbolical sequences

The information decomposition (ID) method to analyze symbolical sequences is presented. This method allows us to reveal a latent periodicity of any symbolical sequence. The ID method is shown to have advantages in comparison with application of the Fourier transformation, the wavelet transform and the dynamic programming method to look for latent periodicity. Examples of the latent periods for poetic texts, DNA sequences and amino acids are presented. Possible origin of a latent periodicity for different symbolical sequences is discussed.

[1]  Y. Almirantis,et al.  Long- and Short-Range Correlations in Genome Organization , 1999 .

[2]  Walter Ledermann,et al.  Handbook of applicable mathematics , 1980 .

[3]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[4]  J. Muzy,et al.  Long-range correlations in genomic DNA: a signature of the nucleosomal structure. , 2001, Physical review letters.

[5]  J. Sekiguchi,et al.  Glucosaminidase of Bacillus subtilis: cloning, regulation, primary structure and biochemical characterization. , 1995, Microbiology.

[6]  Hanspeter Herzel,et al.  Interpreting correlations in biosequences , 1998 .

[7]  R. Linsker,et al.  A measure of DNA periodicity. , 1986, Journal of theoretical biology.

[8]  A. D. Mclachlan Multichannel Fourier analysis of patterns in protein sequences , 1993 .

[9]  E. Trifonov 3-, 10.5-, 200- and 400-base periodicities in genome sequences , 1998 .

[10]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[11]  R Apweiler,et al.  The role SWISS-PROT and TrEMBL play in the genome research environment. , 2000, Journal of biotechnology.

[12]  V R Chechetkin,et al.  Nucleosome units and hidden periodicities in DNA sequences. , 1998, Journal of biomolecular structure & dynamics.

[13]  G. Rose,et al.  Helix signals in proteins. , 1988, Science.

[14]  S. Salzberg,et al.  Complete genome sequence of Treponema pallidum, the syphilis spirochete. , 1998, Science.

[15]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[16]  E Pennisi,et al.  The Human Genome , 2001, Science.

[17]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[18]  A. Danchin,et al.  The calmodulin‐sensitive adenylate cyclase of Bordetella pertussis: cloning and expression in Escherichia col , 1988, Molecular microbiology.

[19]  H Herzel,et al.  Correlations in protein sequences and property codes. , 1998, Journal of theoretical biology.

[20]  René Wellek,et al.  Theory of Literature , 1948 .

[21]  Thomas M. Cover,et al.  Probability and Information. , 1986 .

[22]  Latent Periodicity of Protein Sequences , 1999 .

[23]  R. Voss,et al.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.

[24]  S. Silver,et al.  Nucleotide sequence of a chromosomal mercury resistance determinant from a Bacillus sp. with broad-spectrum mercury resistance , 1989, Journal of bacteriology.

[25]  J. Jackson,et al.  Vectors of shannon information from fourier signals characterizing base periodicity in genes and genomes. , 2000, Biochemical and biophysical research communications.

[26]  J Heringa,et al.  Detection of internal repeats: how common are they? , 1998, Current opinion in structural biology.

[27]  Alain Arneodo,et al.  Long-Range Correlations in Genomic DNA , 2001 .

[28]  S Rackovsky,et al.  "Hidden" sequence periodicities and protein architecture. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Gene identification in bacterial and organellar genomes using GeneScan. , 1999, Computers & chemistry.

[30]  S. Buldyrev,et al.  Species independence of mutual information in coding and noncoding DNA. , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[31]  P. Vandergheynst,et al.  Fourier and wavelet transform analysis, a tool for visualizing regular patterns in DNA sequences. , 2000, Journal of theoretical biology.

[32]  Mark Borodovsky,et al.  The complete genome sequence of the gastric pathogen Helicobacter pylori , 1997, Nature.

[33]  B. Herren,et al.  Conservation in sequence and affinity of human and rodent PDGF ligands and receptors. , 1993, Biochimica et biophysica acta.

[34]  E V Korotkov,et al.  Method revealing latent periodicity of the nucleotide sequences modified for a case of small samples. , 1999, DNA research : an international journal for rapid publication of reports on genes and genomes.

[35]  W. M. Carey,et al.  Digital spectral analysis: with applications , 1986 .

[36]  Solomon Kullback,et al.  Information Theory and Statistics , 1960 .

[37]  D. Karamata,et al.  The gene of the N‐acetylglucosaminidase, a Bacillus subtilis 168 cell wall hydrolase not involved in vegetative cell autolysis , 1994, Molecular microbiology.

[38]  D A Parry,et al.  Structural features in the heptad substructure and longer range repeats of two-stranded alpha-fibrous proteins. , 1990, International journal of biological macromolecules.

[39]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[40]  Finn Drabløs,et al.  Detecting periodic patterns in biological sequences , 1998, Bioinform..

[41]  V. Thorsson,et al.  Genome sequence of Halobacterium species NRC-1. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[42]  B. Barrell,et al.  Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence , 1998, Nature.

[43]  G. Dodin,et al.  Triplet correlation in DNA sequences and stability of heteroduplexes. , 1996, Journal of theoretical biology.