Very efficient search for nucleotide alignments

We describe a very efficient search for nucleotide alignments, which is analogous to the novel very efficient search for protein alignment. Just as it has been the case with the alignment of proteins, based on 20 × 20 adjacency matrices for amino acids, obtained from a superposition of labeled amino acids adjacency matrices for the proteins considered, one can construct labeled matrices of size 4 × 4, listing adjacencies of nucleotides in DNA sequence. The matrix elements correspond to 16 pairs of adjacent nucleotides. To obtain DNA alignments, one combines information in the corresponding matrices for a pair of DNA nucleotides. Matrices are obtained by insertion of the sequential labels for pairs of nucleotides in the corresponding cells of the 4 × 4 tables. When two such matrices are superimposed, one can identify all segments in two DNA sequences, which are shifted relative to one another by the same amount in either direction, without using trial‐and‐error displacements of the two sequences one relative to the other to find local nucleotide alignments. © 2012 Wiley Periodicals, Inc.

[1]  Dejan Plavšić,et al.  Analysis of similarity/dissimilarity of DNA sequences based on novel 2-D graphical representation , 2003 .

[2]  Dejan Plavšić,et al.  A novel unexpected use of a graphical representation of DNA : Graphical alignment of DNA sequences , 2006 .

[3]  Alexandru T Balaban,et al.  Graphical representation of proteins. , 2011, Chemical reviews.

[4]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[5]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[6]  E. Hamori,et al.  H curves, a novel method of representation of nucleotide series especially suited for long DNA sequences. , 1983, The Journal of biological chemistry.

[7]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[8]  Herbert A. Hauptman,et al.  The centrosymmetric crystal , 1953 .

[9]  M. Waterman,et al.  Distributional regimes for the number of k-word matches between two random sequences , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[10]  S. C. Prasad,et al.  The anomalous X‐ray background scattering from β‐tin , 1956 .

[11]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[12]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[13]  H. Hauptman,et al.  A theory of phase determination for the four types of non-centrosymmetric space groups 1P222, 2P22, 3P12, 3P22 , 1956 .

[14]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[15]  Milan Randic Very efficient search for protein alignment—VESPA , 2012, J. Comput. Chem..

[16]  Dejan Plavšić,et al.  Novel 2-D graphical representation of DNA sequences and their numerical characterization , 2003 .

[17]  Milan Randić On a geometry-based approach to protein sequence alignment , 2008 .