Comparisons among the Novel Measurements Based on Chi Square Criterion for Sequence Dissimilarity and Their Applications to Phylogeny

In this paper, some new measurements based on Chi square test are presented. Protein sequences are characterized by the frequency of occurrence of the 20 amino acids. Each frequency of a protein sequence is deemed as a sample for various populations. The value of Chi square in the Chi square test is used to measure the dissimilarity of each pair of the protein sequence. Furthermore, some transformations based on the Chi square value are done to measure the dissimilarity. For example, taking the unbalance of the length of protein sequences into consideration, we standardize the length of protein sequence as 1000 with the same frequency of amino acid. Other transformations are listed as follow, such as, the P value according to the Chi square distribution and the normal distribution quantile according to the P value. Based on the data for the Eutherian orders using concatenated H-stranded amino acid sequences, we compare the phylogeny trees with these measurements for sequence dissimilarity. In line with the results, some phylogeny trees are agreed with the commonly accepted one for the Eutherians.

[1]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[2]  P. Sellers On the Theory and Computation of Evolutionary Distances , 1974 .

[3]  B. Blaisdell A measure of the similarity of sets of sequences not requiring sequence alignment. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[4]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[5]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[6]  N Okada,et al.  Phylogenetic position of guinea pigs revisited. , 1997, Molecular biology and evolution.

[7]  G. Pesole,et al.  Complete mitochondrial DNA sequence of the fat dormouse, Glis glis: further evidence of rodent paraphyly. , 1998, Molecular biology and evolution.

[8]  Jean-Paul Delahaye,et al.  The transformation distance: A dissimilarity measure based an movements of segments , 1998, German Conference on Bioinformatics.

[9]  Jean-Paul Delahaye,et al.  Transformation distances: a family of dissimilarity measures based on movements of segments , 1999, Bioinform..

[10]  Xin Chen,et al.  An information-based sequence distance and its application to whole mitochondrial genome phylogeny , 2001, Bioinform..

[11]  Jonas S. Almeida,et al.  Analysis of genomic sequences by Chaos Game Representation , 2001, Bioinform..