2-Adic clustering of the PAM matrix.

In this paper we demonstrate that the use of the system of 2-adic numbers provides a new insight to some problems of genetics, in particular, degeneracy of the genetic code and the structure of the PAM matrix in bioinformatics. The 2-adic distance is an ultrametric and applications of ultrametric in bioinformatics are not surprising. However, by using the 2-adic numbers we match ultrametric with a number theoretic structure. In this way we find new applications of an ultrametric which differ from known up to now in bioinformatics. We obtain the following results. We show that the PAM matrix A allows the expansion into the sum of the two matrices A=A((2))+A((infinity)), where the matrix A((2)) is 2-adically regular (i.e. matrix elements of this matrix are close to locally constant with respect to the discussed earlier by the authors 2-adic parametrization of the genetic code), and the matrix A((infinity)) is sparse. We discuss the structure of the matrix A((infinity)) in relation to the side chain properties of the corresponding amino acids. We introduce the family of substitution matrices A(alpha,beta)=alpha A((2))+beta A((infinity)), alpha,beta>or=0 which should allow to vary the alignment procedure in order to take into account the different chemical and geometric properties of the amino acids.

[1]  V. A. Avetisov,et al.  Application of p-adic analysis to models of spontaneous breaking of the replica symmetry , 2008 .

[2]  Andrei Khrennikov,et al.  Non-Archimedean Analysis: Quantum Paradoxes, Dynamical Systems and Biological Models , 2011 .

[3]  S. V. Kozyrev,et al.  Application of p-adic analysis to models of breaking of replica symmetry , 1999 .

[4]  P. Sorba,et al.  A crystal base for the genetic code , 1998 .

[5]  S. Miyazaki,et al.  INFORMATIONAL APPROACH FOR THE STUDY OF CIS-REGULATORY ELEMENTS AND DNA BINDING PROTEINS , 2008 .

[6]  G. Gamow Possible Relation between Deoxyribonucleic Acid and Protein Structures , 1954, Nature.

[7]  Is there a physical chemical basis for the present genetic code? , 1972, Journal of Molecular Evolution.

[8]  R. Root-Bernstein,et al.  On the origin of the genetic code. , 1982, Journal of theoretical biology.

[9]  V. M. Gundlach,et al.  A p-Adic model for the process of thinking disturbed by physiological and information noise. , 1999, Journal of theoretical biology.

[10]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[11]  Andrie Khrennikov p-Adic Discrete Dynamical Systems and Collective Behaviour of Information States in Cognitive Models , 2000 .

[12]  A Khrennikov,et al.  Human subconscious as a p-adic dynamical system. , 1998, Journal of theoretical biology.

[13]  Michael W. Deem Introduction to Mathematical Methods in Bioinformatics , 2005 .

[14]  Svante Wold,et al.  A multivariate study of the relationship between the genetic code and the physical-chemical properties of amino acids , 2005, Journal of Molecular Evolution.

[15]  Dierk Wanke,et al.  A BASIC INTRODUCTION TO GENE EXPRESSION STUDIES USING MICROARRAY EXPRESSION DATA ANALYSIS , 2009 .

[16]  S. V. Kozyrev,et al.  Methods and applications of ultrametric and p-adic analysis: From wavelet theory to biophysics , 2011 .

[17]  F. Crick Origin of the Genetic Code , 1967, Nature.

[18]  John R. Jungck,et al.  The genetic code as a periodic table , 1978, Journal of Molecular Evolution.

[19]  M. Waterman,et al.  Pattern analysis of the genetic code , 1988 .

[20]  S. V. Kozyrev,et al.  On p-adic mathematical physics , 2006, 0904.4205.

[21]  I. Volovich Number theory as the ultimate physical theory , 1987 .

[22]  V. Chechetkin,et al.  Block structure and stability of the genetic code. , 2003, Journal of theoretical biology.

[23]  S. V. Kozyrev,et al.  Genetic code on the diadic plane , 2007, q-bio/0701007.

[24]  H. Klump Exploring the energy landscape of the genetic code. , 2006, Archives of biochemistry and biophysics.

[25]  G. L. Findley,et al.  Symmetry characteristics of the genetic code. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[26]  R. Swanson A unifying concept for the amino acid code. , 1984, Bulletin of mathematical biology.

[27]  Jayanth R Banavar,et al.  Physics of proteins. , 2007, Annual review of biophysics and biomolecular structure.

[28]  Andrei Khrennikov Probabilistic pathway representation of cognitive information. , 2004, Journal of theoretical biology.

[29]  Iu B Rumer [Codon systematization in the genetic code]. , 1966, Doklady Akademii nauk SSSR.

[30]  J. Lehmann,et al.  Physico-chemical constraints connected with the coding properties of the genetic system. , 2000, Journal of theoretical biology.

[31]  Igor Volovich,et al.  p-adic string , 1987 .

[32]  F. Murtagh,et al.  Multivariate Data Analysis , 1986 .

[33]  Rumer IuB Codon systematization in the genetic code , 1966 .

[34]  Branko Dragovich,et al.  A p-adic model of DNA sequence and genetic code , 2006, ArXiv.

[35]  Andrei Khrennikov,et al.  Applied Algebraic Dynamics , 2009 .

[36]  S Albeverio,et al.  Memory retrieval as a p-adic dynamical system. , 1999, Bio Systems.

[37]  C. Soulé,et al.  Symmetries by base substitutions in the genetic code predict 2(') or 3(') aminoacylation of tRNAs. , 2007, Journal of theoretical biology.

[38]  Rumer IuB Systematization of codons in the genetic code , 1969 .

[39]  Iu B Rumer [Systematization of codons in the genetic code]. , 1968, Doklady Akademii nauk SSSR.