Enlarged similarity of nucleic acid sequences.

The concept of nucleic acid sequence base alternations is presented. The number of base alterations for the sequences of different length is established. The definition of "enlarged similarity" of nucleic acids sequences on the basis of sequence base alterations is introduced. Mutual information between sequences is used as a quantitative measure of enlarged similarity for two compared sequences. The method of mutual information calculation is developed considering the correlation of bases in compared sequences. The definitions of correlated similarity and evolution similarity between compared sequences are given. Results of the use of enlarged similarity approach for DNA sequences analysis are discussed.

[1]  N. Maeda,et al.  Duplication within the haptoglobin Hp2 gene , 1984, Nature.

[2]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[3]  L. Donehower,et al.  Identification of a conserved sequence in the non-coding regions of many human genes. , 1989, Nucleic acids research.

[4]  The Human Genome Program at the National Institutes of Health. , 1989, Genomics.

[5]  Arian F. A. Smit,et al.  MIRs are classic, tRNA-derived SINEs that amplified before the mammalian radiation , 1995, Nucleic Acids Res..

[6]  J. Davies,et al.  Molecular Biology of the Cell , 1983, Bristol Medico-Chirurgical Journal.

[7]  S. B. Needleman,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 1989 .

[8]  E V Korotkov Fast method of homology and purine-pyrimidine mutual relations between DNA sequences search. , 1994, DNA sequence : the journal of DNA sequencing and mapping.

[9]  C R Cantor,et al.  Orchestrating the Human Genome Project. , 1990, Science.

[10]  V. A. Kulichkov,et al.  [Complexity analysis of genomes. I. Complexity and classification methods of detected structural regularities]. , 1991, Molekuliarnaia biologiia.

[11]  G. I. Bell The Human Genome Program , 1989 .

[12]  S. Kullback,et al.  Information Theory and Statistics , 1959 .

[13]  Eugene V. Korotkov,et al.  Latent sequence periodicity of some oncogenes and DNA-binding protein genes , 1997, Comput. Appl. Biosci..

[14]  Jerzy Jurka,et al.  Ubiquitous mammalian-wide interspersed repeats (MIRs) are molecular fossils from the mesozoic era , 1995, Nucleic Acids Res..

[15]  Martin J. Bishop,et al.  Guide to Human Genome Computing , 1994 .

[16]  Sándor Suhai,et al.  Computational Methods in Genome Research , 1994, Springer US.

[17]  E V Korotkov,et al.  Latent periodicity of DNA sequences from some human gene regions. , 1995, DNA sequence : the journal of DNA sequencing and mapping.

[18]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.