FastNet: Fast and Accurate Statistical Inference of Phylogenetic Networks Using Large-Scale Genomic Sequence Data

An emerging discovery in phylogenomics is that interspecific gene flow has played a major role in the evolution of many different organisms. To what extent is the Tree of Life not truly a tree reflecting strict “vertical” divergence, but rather a more general graph structure known as a phylogenetic network which also captures “horizontal” gene flow? The answer to this fundamental question not only depends upon densely sampled and divergent genomic sequence data, but also computational methods which are capable of accurately and efficiently inferring phylogenetic networks from large-scale genomic sequence datasets. Recent methodological advances have attempted to address this gap. However, in the 2016 performance study of Hejase and Liu, state-of-the-art methods fell well short of the scalability requirements of existing phylogenomic studies.

[1]  V. Moulton,et al.  Neighbor-net: an agglomerative method for the construction of phylogenetic networks. , 2002, Molecular biology and evolution.

[2]  Daniel H. Huson,et al.  Phylogenetic Networks - Concepts, Algorithms and Applications , 2011 .

[3]  J. Huelsenbeck,et al.  SUCCESS OF PHYLOGENETIC METHODS IN THE FOUR-TAXON CASE , 1993 .

[4]  L. Nakhleh,et al.  Computational approaches to species phylogeny inference and gene tree reconciliation. , 2013, Trends in ecology & evolution.

[5]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[6]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[7]  Luay Nakhleh,et al.  PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships , 2008, BMC Bioinformatics.

[8]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[9]  Philip L. F. Johnson,et al.  A Draft Sequence of the Neandertal Genome , 2010, Science.

[10]  J. Slot,et al.  Dimensions of Horizontal Gene Transfer in Eukaryotic Microbial Pathogens , 2015, PLoS pathogens.

[11]  Serita M. Nelesen,et al.  Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees , 2009, Science.

[12]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[13]  Mike A. Steel,et al.  Which Phylogenetic Networks are Merely Trees with Additional Arcs? , 2015, Systematic biology.

[14]  Simon H. Martin,et al.  Butterfly genome reveals promiscuous exchange of mimicry adaptations among species , 2012, Nature.

[15]  David Reich,et al.  Testing for ancient admixture between closely related populations. , 2011, Molecular biology and evolution.

[16]  Luay Nakhleh,et al.  The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection , 2012, PLoS genetics.

[17]  Yun Yu,et al.  A maximum pseudo-likelihood approach for phylogenetic networks , 2015, BMC Genomics.

[18]  Robert W. Taylor,et al.  MPV17 Loss Causes Deoxynucleotide Insufficiency and Slow DNA Replication in Mitochondria , 2016, PLoS genetics.

[19]  Daniel H. Huson,et al.  Phylogenetic Networks: Contents , 2010 .

[20]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[21]  Tandy J. Warnow,et al.  ASTRAL: genome-scale coalescent-based species tree estimation , 2014, Bioinform..

[22]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[23]  Kevin J. Liu,et al.  Interspecific introgressive origin of genomic diversity in the house mouse , 2013, Proceedings of the National Academy of Sciences.

[24]  James E. Allen,et al.  Highly evolvable malaria vectors: The genomes of 16 Anopheles mosquitoes , 2014, Science.

[25]  Ziheng Yang,et al.  The influence of gene flow on species tree estimation: a simulation study. , 2014, Systematic biology.

[26]  Michael J. Sanderson,et al.  R8s: Inferring Absolute Rates of Molecular Evolution, Divergence times in the Absence of a Molecular Clock , 2003, Bioinform..

[27]  Serita M. Nelesen,et al.  SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees. , 2012, Systematic biology.

[28]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[29]  Tandy J. Warnow,et al.  ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes , 2015, Bioinform..

[30]  J. Palmer,et al.  Horizontal gene transfer in eukaryotic evolution , 2008, Nature Reviews Genetics.

[31]  S. Edwards IS A NEW AND GENERAL THEORY OF MOLECULAR SYSTEMATICS EMERGING? , 2009, Evolution; international journal of organic evolution.

[32]  Luay Nakhleh,et al.  Coalescent histories on phylogenetic networks and detection of hybridization despite incomplete lineage sorting. , 2011, Systematic biology.

[33]  Tandy J. Warnow,et al.  Towards the Development of Computational Tools for Evaluating Phylogenetic Network Reconstruction Methods , 2002, Pacific Symposium on Biocomputing.

[34]  Claudia R. Solís-Lemus,et al.  Inferring Phylogenetic Networks with Maximum Pseudolikelihood under Incomplete Lineage Sorting , 2015, PLoS genetics.

[35]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[36]  H. Akaike A new look at the statistical model identification , 1974 .

[37]  Kevin J. Liu,et al.  A scalability study of phylogenetic network inference methods using empirical datasets and simulations involving a single reticulation , 2016, BMC Bioinformatics.

[38]  A. Dress,et al.  A canonical decomposition theory for metrics on a finite set , 1992 .

[39]  Tandy J. Warnow,et al.  PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences , 2015, J. Comput. Biol..

[40]  Louxin Zhang On Tree-Based Phylogenetic Networks , 2016, J. Comput. Biol..

[41]  Gabriel Cardona,et al.  Tripartitions do not always discriminate phylogenetic networks , 2008, Mathematical biosciences.

[42]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[43]  Philip L. F. Johnson,et al.  Genetic history of an archaic hominin group from Denisova Cave in Siberia , 2010, Nature.

[44]  Charles Semple,et al.  Hybrids in real time. , 2006, Systematic biology.

[45]  Kevin J. Liu,et al.  Maximum likelihood inference of reticulate evolutionary histories , 2014, Proceedings of the National Academy of Sciences.

[46]  J. McInerney,et al.  The prokaryotic tree of life: past, present... and future? , 2008, Trends in ecology & evolution.

[47]  Carsten Wiuf,et al.  Gene Genealogies, Variation and Evolution - A Primer in Coalescent Theory , 2004 .