Systematic investigations of gene effects on both topologies and supports: An Echinococcus illustration

In this paper, we propose a high performance computing toolbox implementing efficient statistical methods for the study of phylogenies. This toolbox, which implements logit models and LASSO-type penalties, gives a way to better understand, measure, and compare the impact of each gene on a global phylogeny. As an application, we study the Echinococcus phylogeny, which is often considered as a particularly difficult example. Mitochondrial and nuclear genomes (19 coding sequences) of nine Echinococcus species are considered in order to investigate the molecular phylogeny of this genus. First, we check that the 19 gene trees lead to 19 totally different unsupported topologies (a topology is the sister relationship when both branch lengths and supports are ignored in a phylogenetic tree), while using the 19 genes as a whole are not sufficient for estimating the phylogeny. In order to circumvent this issue and understand the impact of the genes, we computed 43,796 trees using combinations ranging from 13 to 19 genes. By doing so, 15 topologies are obtained. Four particular topologies, appearing more robust and frequent, are then selected for more precise investigation. Refining further our statistical analysis, a particularly robust topology is extracted. We also carefully demonstrate the influence of nuclear genes on the likelihood of the phylogeny.

[1]  Nicolas Lartillot,et al.  PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating , 2009, Bioinform..

[2]  D. McManus,et al.  Towards a taxonomic revision of the genus Echinococcus. , 2002, Trends in parasitology.

[3]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[4]  Robert K. Jansen,et al.  Automatic annotation of organellar genomes with DOGMA , 2004, Bioinform..

[5]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[6]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[7]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[8]  Jacques M. Bahi,et al.  Gene similarity-based approaches for determining core-genes of chloroplasts , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[9]  Takashi Iwaki,et al.  Molecular phylogeny of the genus Taenia (Cestoda: Taeniidae): proposals for the resurrection of Hydatigera Lamarck, 1816 and the creation of a new genus Versteria. , 2013, International journal for parasitology.

[10]  O. Gascuel,et al.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. , 2010, Systematic biology.

[11]  M. Nakao,et al.  Phylogenetic systematics of the genus Echinococcus (Cestoda: Taeniidae). , 2013, International journal for parasitology.

[12]  D. McManus,et al.  Complete mitochondrial genomes confirm the distinctiveness of the horse-dog and sheep-dog strains of Echinococcus granulosus , 2002, Parasitology.

[13]  Jacques M. Bahi,et al.  Hybrid Genetic Algorithm and Lasso Test Approach for Inferring Well Supported Phylogenetic Trees Based on Subsets of Chloroplastic Core Genes , 2015, AlCoB.

[14]  N. Yokoyama,et al.  The complete mitochondrial DNA sequence of the cestode Echinococcus multilocularis (Cyclophyllidea: Taeniidae). , 2002, Mitochondrion.

[15]  A. Rambaut,et al.  BEAST: Bayesian evolutionary analysis by sampling trees , 2007, BMC Evolutionary Biology.

[16]  D. McManus,et al.  A molecular phylogeny of the genus Echinococcus inferred from complete mitochondrial genomes , 2006, Parasitology.

[17]  A. Meyer,et al.  Phylogenetic performance of mitochondrial protein-coding genes in resolving relationships among vertebrates. , 1996, Molecular biology and evolution.

[18]  Jacques M. Bahi,et al.  Finding the Core-Genes of Chloroplasts , 2014, ArXiv.

[19]  Jacques M. Bahi,et al.  Taenia biomolecular phylogeny and the impact of mitochondrial genes on this latter , 2015, 2015 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).

[20]  Minoru Nakao,et al.  Phylogenetic relationships within Echinococcus and Taenia tapeworms (Cestoda: Taeniidae): an inference from nuclear protein-coding genes. , 2011, Molecular phylogenetics and evolution.

[21]  D. McManus,et al.  Genomics of parasitic flatworms. , 2004, International journal for parasitology.

[22]  V. A. Zaikov,et al.  Mitochondrial phylogeny of the genus Echinococcus (Cestoda: Taeniidae) with emphasis on relationships among Echinococcus canadensis genotypes , 2013, Parasitology.

[23]  Thomas Ludwig,et al.  RAxML-OMP: An Efficient Program for Phylogenetic Inference on SMPs , 2005, PaCT.

[24]  Christopher. Simons,et al.  Machine learning with Python , 2017 .

[25]  Michael Gribskov,et al.  BEAST on Diagrid: Bayesian Evolutionary Analysis by Sampling Trees , 2013 .

[26]  Jacques M. Bahi,et al.  The study of unfoldable self-avoiding walks - Application to protein structure prediction software , 2015, J. Bioinform. Comput. Biol..

[27]  Lotta M. Hardman,et al.  Comparison of the phylogenetic performance of neodermatan mitochondrial protein‐coding genes , 2006 .

[28]  Michael C Whitlock,et al.  The incomplete natural history of mitochondria , 2004, Molecular ecology.