Characteristic Sequences for DNA Primary Sequence

A DNA sequence can be identified with a word over an alphabet N = [A, C, G, T]. Characteristic sequences of a DNA sequence are given in term of classifications of bases of nucleic acids. Using the characteristic sequences, we construct a set of 2 x 2 matrices to represent DNA primary sequences, which are based on counting of the frequency of occurrence of all (0,1) triplets of characteristic sequences. Furthermore, the leading eigenvalues of these matrices are computed and considered as invariants for the DNA primary sequences. Similarity and dissimilarity analysis based on the characteristic sequences are given for eight exon-1 genes of beta-globin about eight species.