An Invariants-based Method for Efficient Identification of Hybrid Species From Large-scale Genomic Data

Coalescent-based species tree inference has become widely used in the analysis of genome-scale multilocus and SNP datasets when the goal is inference of a species-level phylogeny. However, numerous evolutionary processes are known to violate the assumptions of a coalescence-only model and complicate inference of the species tree. One such process is hybrid speciation, in which a species shares its ancestry with two distinct species. Although many methods have been proposed to detect hybrid speciation, only a few have considered both hybridization and coalescence in a unified framework, and these are generally limited to the setting in which putative hybrid species must be identified in advance. Here we propose a method that can examine genome-scale data for a large number of taxa and detect those taxa that may have arisen via hybridization, as well as their potential “parental” taxa. The method is based on a model that considers both coalescence and hybridization together, and uses phylogenetic invariants to construct a test that scales well in terms of computational time for both the number of taxa and the amount of sequence data. We test the method using simulated data for up 20 taxa and 100,000bp, and find that the method accurately identifies both recent and ancient hybrid species in less than 30 seconds. We apply the method to two empirical datasets, one composed of Sistrurus rattlesnakes for which hybrid speciation is not supported by previous work, and one consisting of several species of Heliconius butterflies for which some evidence of hybrid speciation has been previously found.

[1]  W. Salzburger,et al.  Speciation via introgressive hybridization in East African cichlids? , 2002, Molecular ecology.

[2]  E. Salmela,et al.  Morphological, cytogenetic, and molecular evidence for introgressive hybridization in birch. , 2001, The Journal of heredity.

[3]  T. Dowling,et al.  Origin of Gila seminuda (Teleostei: Cyprinidae) through introgressive hybridization: implications for evolution and conservation. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[4]  M. P. Cummings,et al.  PAUP* Phylogenetic analysis using parsimony (*and other methods) Version 4 , 2000 .

[5]  Loren H Rieseberg,et al.  A genomic view of introgression and hybrid speciation. , 2007, Current opinion in genetics & development.

[6]  J. Good,et al.  PHYLOGEOGRAPHY AND INTROGRESSIVE HYBRIDIZATION: CHIPMUNKS (GENUS TAMIAS) IN THE NORTHERN ROCKY MOUNTAINS , 2003, Evolution; international journal of organic evolution.

[7]  David Reich,et al.  Testing for ancient admixture between closely related populations. , 2011, Molecular biology and evolution.

[8]  M. Nei,et al.  Gene genealogy and variance of interpopulational nucleotide differences. , 1985, Genetics.

[9]  M. Nei,et al.  Relationships between gene trees and species trees. , 1988, Molecular biology and evolution.

[10]  L. Bullini Origin and evolution of animal hybrid species. , 1994, Trends in ecology & evolution.

[11]  D. Tautz,et al.  An invasive lineage of sculpins, Cottus sp. (Pisces, Teleostei) in the Rhine with new habitat adaptations has originated from hybridization between old phylogeographic groups , 2005, Proceedings of the Royal Society B: Biological Sciences.

[12]  R. C. Geary The Frequency Distribution of the Quotient of Two Normal Variates , 2022 .

[13]  D. Hinkley On the ratio of two correlated normal random variables , 1969 .

[14]  J. Peterson,et al.  A Model Using Phenotypic Characteristics to Detect Introgressive Hybridization in Wild Westslope Cutthroat Trout and Rainbow Trout , 2002 .

[15]  Swapan Mallick,et al.  Ancient Admixture in Human History , 2012, Genetics.

[16]  A. Shapiro,et al.  Homoploid Hybrid Speciation in an Extreme Habitat , 2006, Science.

[17]  Loren H Rieseberg,et al.  The likelihood of homoploid hybrid speciation , 2000, Heredity.

[18]  H. Munro,et al.  Mammalian protein metabolism , 1964 .

[19]  Philip L. F. Johnson,et al.  A Draft Sequence of the Neandertal Genome , 2010, Science.

[20]  Kevin J. Liu,et al.  Maximum likelihood inference of reticulate evolutionary histories , 2014, Proceedings of the National Academy of Sciences.

[21]  A. Drummond,et al.  Bayesian Inference of Species Trees from Multilocus Data , 2009, Molecular biology and evolution.

[22]  T. Dowling,et al.  Evolutionary significance of introgressive hybridization in cyprinid fishes , 1993, Nature.

[23]  Laura Kubatko,et al.  Identifiability of the unrooted species tree topology under the coalescent model with time-reversible substitution processes, site-specific rate variation, and invariable sites. , 2014, Journal of theoretical biology.

[24]  H. Gibbs,et al.  Genetic identity of endangered massasauga rattlesnakes (Sistrurus sp.) in Missouri , 2011, Conservation Genetics.

[25]  J A Lake,et al.  A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. , 1987, Molecular biology and evolution.

[26]  L. Bernatchez,et al.  Evidence for broadscale introgressive hybridization between two redfish (genus Sebastes) in the North‐west Atlantic: a rare marine example , 2001, Molecular ecology.

[27]  Bonnie E. Shook-Sa,et al.  . CC-BY-NC-ND 4 . 0 International licenseIt is made available under a is the author / funder , who has granted medRxiv a license to display the preprint in perpetuity , 2021 .

[28]  A. Meyer,et al.  Hybrid origin of a swordtail species (Teleostei: Xiphophorus clemenciae) driven by sexual selection , 2006, Molecular ecology.

[29]  C. J-F,et al.  THE COALESCENT , 1980 .

[30]  Scott V Edwards,et al.  A maximum pseudo-likelihood approach for estimating species trees under the coalescent model , 2010, BMC Evolutionary Biology.

[31]  Marta Casanellas,et al.  Relevant phylogenetic invariants of evolutionary models , 2009, 0912.1957.

[32]  David Bryant,et al.  Next-generation sequencing reveals phylogeographic structure and a species tree for recent bird divergences. , 2009, Molecular phylogenetics and evolution.

[33]  L. Rieseberg,et al.  The ecological genetics of homoploid hybrid speciation. , 2005, The Journal of heredity.

[34]  D. Pearl,et al.  Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. , 2007, Systematic biology.

[35]  Noah A Rosenberg,et al.  The probability of topological concordance of gene trees and species trees. , 2002, Theoretical population biology.

[36]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[37]  S. Tavaré,et al.  Line-of-descent and genealogical processes, and their applications in population genetics models. , 1984, Theoretical population biology.

[38]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[39]  S. Jeffery Evolution of Protein Molecules , 1979 .

[40]  Laura Salter Kubatko,et al.  STEM: species tree estimation using maximum likelihood for gene trees under coalescence , 2009, Bioinform..

[41]  Tandy J. Warnow,et al.  ASTRAL: genome-scale coalescent-based species tree estimation , 2014, Bioinform..

[42]  N. Eriksson Algebraic Statistics for Computational Biology: Tree Construction using Singular Value Decomposition , 2005 .

[43]  David Gerard,et al.  Estimating hybridization in the presence of coalescence using phylogenetic intraspecific sampling , 2011, BMC Evolutionary Biology.

[44]  J. Mallet Hybrid speciation , 2007, Nature.

[45]  L. Kubatko Identifying hybridization events in the presence of coalescence via model selection. , 2009, Systematic biology.

[46]  Simon H. Martin,et al.  Genome-wide evidence for speciation with gene flow in Heliconius butterflies , 2013, Genome research.

[47]  S. Joly JML: testing hybridization from species trees , 2012, Molecular ecology resources.

[48]  Patricia A. McLenachan,et al.  A Statistical Approach for Distinguishing Hybridization and Incomplete Lineage Sorting , 2009, The American Naturalist.

[49]  L. Pachter,et al.  Algebraic Statistics for Computational Biology: Preface , 2005 .

[50]  J. Kingman On the genealogy of large populations , 1982, Journal of Applied Probability.

[51]  Laura Salter Kubatko,et al.  Quartet Inference from SNP Data Under the Coalescent Model , 2014, Bioinform..

[52]  J. Felsenstein,et al.  Invariants of phylogenies in a simple case with discrete states , 1987 .

[53]  J. Mallet Hybridization as an invasion of the genome. , 2005, Trends in ecology & evolution.

[54]  Erik Bloomquist,et al.  Inferring species-level phylogenies and taxonomic distinctiveness using multilocus data in Sistrurus rattlesnakes. , 2011, Systematic biology.

[55]  D. Posada Evaluation of methods for detecting recombination from DNA sequences: empirical data. , 2002, Molecular biology and evolution.

[56]  J. Wakeley Coalescent Theory: An Introduction , 2008 .

[57]  Andrew G. Stephenson,et al.  Experimental and Molecular Approaches to Plant Biosystematics , 1997 .

[58]  Loren H. Rieseberg,et al.  Hybrid Origins of Plant Species , 1997 .

[59]  O. Seehausen Hybridization and adaptive radiation. , 2004, Trends in ecology & evolution.

[60]  Chris D. Jiggins,et al.  Speciation by hybridization in Heliconius butterflies , 2006, Nature.

[61]  G. Serio,et al.  A new method for calculating evolutionary substitution rates , 2005, Journal of Molecular Evolution.

[62]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[63]  Ziheng Yang,et al.  Likelihood and Bayes estimation of ancestral population sizes in hominoids using data from multiple loci. , 2002, Genetics.

[64]  F. Lapointe,et al.  Hybrids and Phylogenetics Revisited: A Statistical Test of Hybridization Using Quartets , 2007 .

[65]  Laura Salter Kubatko,et al.  Detecting hybrid speciation in the presence of incomplete lineage sorting using gene tree incongruence: a model. , 2009, Theoretical population biology.

[66]  D. Schwarz,et al.  Host shift to an invasive plant triggers rapid animal hybrid speciation , 2005, Nature.

[67]  F. Tajima Evolutionary relationship of DNA sequences in finite populations. , 1983, Genetics.