Use of average mutual information signatures to construct phylogenetic trees for fungi

Average mutual information (AMI) has been applied to many fields, including various aspects of bioinformatics. In this paper, we evaluate its performance as a measure of evolutionary distance between sequences. We use the internal transcribed spacer (ITS) regions for 16 fungal sequences as representative sequences used for species comparison. We generate profiles based on the AMI for each species' ITS sequence. We then populate a distance matrix for the set of species using either a Euclidean or correlation distance between AMI profiles. We generate phylogenetic trees using the distance matrices as input. While these trees do not exactly match the accepted fungal phylogeny, there are sufficient commonalities to merit further investigation of AMI as a distance metric and tool for inferring relationships. We also simulate the evolution of an ITS sequence in order to observe how point mutations affect the distance between AMI profiles, concluding that a correlation distance performs slightly better than a Euclidean distance.

[1]  Khalid Sayood,et al.  Introduction to Data Compression , 1996 .

[2]  Khalid Sayood,et al.  Utilization of the relative complexity measure to construct a phylogenetic tree for fungi. , 2004, Mycological research.

[3]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[4]  John L. Spouge,et al.  Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi , 2012, Proceedings of the National Academy of Sciences.

[5]  Hanspeter Herzel,et al.  Correlations in DNA sequences: The role of protein coding segments , 1997 .

[6]  M. Kreitman,et al.  Variation in the ratio of nucleotide substitution and indel rates across genomes in mammals and bacteria. , 2009, Molecular biology and evolution.

[7]  A. Lapedes,et al.  Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Alan S. Lapedes,et al.  Analysis of Correlations Between Sites in Models of Protein Sequences , 1998 .

[9]  Ramón Román-Roldán,et al.  Application of information theory to DNA sequence analysis: A review , 1996, Pattern Recognit..

[10]  Steven H. Hinrichs,et al.  Use of Average Mutual Information based Species Signature for Fungal and Mycobacterial Differentiation , 2007, BIOCOMP.

[11]  Manuel A. S. Santos,et al.  Evolution of pathogenicity and sexual reproduction in eight Candida genomes , 2009, Nature.

[12]  S. Buldyrev,et al.  Species independence of mutual information in coding and noncoding DNA. , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[13]  Khalid Sayood,et al.  A divide-and-conquer approach to fragment assembly , 2003, Bioinform..

[14]  Khalid Sayood,et al.  The Average Mutual Information Profile as a Genomic Signature , 2008, BMC Bioinformatics.

[15]  K. Sayood,et al.  Use of average mutual information for studying changes in HIV populations , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.