Improvement of phylogenetic method to analyze compositional heterogeneity

BackgroundPhylogenetic analysis is a key way to understand current research in the biological processes and detect theory in evolution of natural selection. The evolutionary relationship between species is generally reflected in the form of phylogenetic trees. Many methods for constructing phylogenetic trees, are based on the optimization criteria. We extract the biological data via modeling features, and then compare these characteristics to study the biological evolution between species.ResultsHere, we use maximum likelihood and Bayesian inference method to establish phylogenetic trees; multi-chain Markov chain Monte Carlo sampling method can be used to select optimal phylogenetic tree, resolving local optimum problem. The correlation model of phylogenetic analysis assumes that phylogenetic trees are built on homogeneous data, however there exists a large deviation in the presence of heterogeneous data. We use conscious detection to solve compositional heterogeneity. Our method is evaluated on two sets of experimental data, a group of bacterial 16S ribosomal RNA gene data, and a group of genetic data with five homologous species.ConclusionsOur method can obtain accurate phylogenetic trees on the homologous data, and also detect the compositional heterogeneity of experimental data. We provide an efficient method to enhance the accuracy of generated phylogenetic tree.

[1]  J. Thompson,et al.  Multiple sequence alignment with Clustal X. , 1998, Trends in biochemical sciences.

[2]  O. Gascuel,et al.  Improvement of distance-based phylogenetic methods by a local maximum likelihood approach using triplets. , 2002, Molecular biology and evolution.

[3]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[4]  Andreas D. Baxevanis,et al.  Bioinformatics - a practical guide to the analysis of genes and proteins , 2001, Methods of biochemical analysis.

[5]  P. Lewis A likelihood approach to estimating phylogeny from discrete morphological character data. , 2001, Systematic biology.

[6]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[7]  F J Ayala,et al.  Tree rooting with outgroups when they differ in their nucleotide composition from the ingroup: the Drosophila saltans and willistoni groups, a case study. , 2000, Molecular phylogenetics and evolution.

[8]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[9]  Rainer Fuchs,et al.  CLUSTAL V: improved software for multiple sequence alignment , 1992, Comput. Appl. Biosci..

[10]  J. Thompson,et al.  The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. , 1997, Nucleic acids research.

[11]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[12]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[13]  A. Gibbs,et al.  The Diagram, a Method for Comparing Sequences , 1970 .

[14]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[15]  Peter G Foster,et al.  Modeling compositional heterogeneity. , 2004, Systematic biology.

[16]  A. Edwards,et al.  The reconstruction of evolution , 1963 .

[17]  M. Nei,et al.  MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. , 2011, Molecular biology and evolution.

[18]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..

[19]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[20]  M. Steel,et al.  Recovering evolutionary trees under a more realistic model of sequence evolution. , 1994, Molecular biology and evolution.

[21]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[22]  D. Higgins,et al.  See Blockindiscussions, Blockinstats, Blockinand Blockinauthor Blockinprofiles Blockinfor Blockinthis Blockinpublication Clustal: Blockina Blockinpackage Blockinfor Blockinperforming Multiple Blockinsequence Blockinalignment Blockinon Blockina Minicomputer Article Blockin Blockinin Blockin , 2022 .

[23]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[24]  S. Jeffery Evolution of Protein Molecules , 1979 .

[25]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[26]  B. Larget,et al.  Markov Chain Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees , 2000 .

[27]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[28]  W. Fitch,et al.  Construction of phylogenetic trees. , 1967, Science.