Analyzing DNA Strings using Information Theory Concepts

DNA has vast capacity to carry very important information in the form of character strings, some mathematical methods can be applied to convert these character strings to numerical values. This paper explores various such methods to analyze similarity/dissimilarity between sequences. We consider entropy as a measure of information by modifying the entropy expression including shannon's entropy, relative or information in compressed form. The paper has also explored the application of entropy to analyze the sequence string and propose a Markov model based relative distance to compare DNA strings. The results are in the form of phylogenetic tree and it is validated using maximum agreement subtree score and symmetric distance.