Intragenic conflict in phylogenomic datasets

Most phylogenetic analyses assume that a single evolutionary history underlies one gene. However, both biological processes and errors in dataset assembly can violate this assumption causing intragenic conflict. The extent to which this conflict is present in empirical datasets is not well documented. However, if common, it would have far-reaching implications for phylogenetic analyses. Here, we examined several large phylogenomic datasets from diverse taxa using a fast and simple method to identify well supported intragenic conflict. We found conflict to be highly variable between datasets, from 1% to more than 92% of genes investigated. To better characterize patterns of conflict, we analyzed four genes with no obvious data assembly errors in more detail. Analyses on simulated data highlighted that alignment error may be one major source of conflict. Whether as part of data analysis pipelines or in order to explore potential biologically compelling intragenic processes, analyses of within gene signal should become common. The method presented here provides a relatively fast means for identifying conflicts that is agnostic to the generating process. Datasets identified with high intragenic conflict may either have significant errors in dataset assembly or represent conflict generated by biological processes. Conflicts that are the result of error should be identified and discarded or corrected. For those conflicts that are the result of biological processes, these analyses contribute to the growing consensus that, similar to genomes, genes themselves may exhibit multiple conflicting evolutionary histories across the tree of life.

[1]  M. Suchard,et al.  Oh brother, where art thou? A Bayes factor test for recombination with uncertain heritage. , 2002, Systematic biology.

[2]  D. Husmeier,et al.  Detecting recombination in 4-taxa DNA sequence alignments with Bayesian hidden Markov models and Markov chain Monte Carlo. , 2003, Molecular biology and evolution.

[3]  R. Nielsen,et al.  Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. , 2003, Genetics.

[4]  Ari Löytynoja,et al.  An algorithm for progressive multiple alignment of sequences with insertions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Sergei L. Kosakovsky Pond,et al.  GARD: a genetic algorithm for recombination detection , 2006, Bioinform..

[6]  David Posada,et al.  Automated phylogenetic detection of recombination using a genetic algorithm. , 2006, Molecular biology and evolution.

[7]  M. Rosenberg,et al.  Multiple sequence alignment accuracy and phylogenetic inference. , 2006, Systematic biology.

[8]  A. Hobolth,et al.  Genomic Relationships and Speciation Times of Human, Chimpanzee, and Gorilla Inferred from a Coalescent Hidden Markov Model , 2006, PLoS genetics.

[9]  S. Edwards IS A NEW AND GENERAL THEORY OF MOLECULAR SYSTEMATICS EMERGING? , 2009, Evolution; international journal of organic evolution.

[10]  Lior Pachter,et al.  Fast Statistical Alignment , 2009, PLoS Comput. Biol..

[11]  Manolo Gouy,et al.  A Mixture Model and a Hidden Markov Model to Simultaneously Detect Recombination Breakpoints and Reconstruct Phylogenies , 2009, Evolutionary bioinformatics online.

[12]  Ziheng Yang,et al.  INDELible: A Flexible Simulator of Biological Sequence Evolution , 2009, Molecular biology and evolution.

[13]  Todd A. Castoe,et al.  Evidence for an ancient adaptive episode of convergent molecular evolution , 2009, Proceedings of the National Academy of Sciences.

[14]  O. Gascuel,et al.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. , 2010, Systematic biology.

[15]  Cécile Ané,et al.  Detecting Phylogenetic Breakpoints and Discordance from Genome-Wide Alignments for Species Tree Reconstruction , 2011, Genome biology and evolution.

[16]  Nicholas G. Crawford,et al.  LSU Digital Commons LSU Digital Commons Ultraconserved elements are novel phylogenomic markers that Ultraconserved elements are novel phylogenomic markers that resolve placental mammal phylogeny when combined with resolve placental mammal phylogeny when combined with species-tree analysis species-tr , 2022 .

[17]  Hayley C. Lanier,et al.  Is recombination a problem for species-tree analyses? , 2012, Systematic biology.

[18]  Genomic and Morphological Evidence Converge to Resolve the Enigma of Strepsiptera , 2012, Current Biology.

[19]  Sen Song,et al.  Resolving conflict in eutherian mammal phylogeny using phylogenomics and the multispecies coalescent model , 2012, Proceedings of the National Academy of Sciences.

[20]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[21]  M. Gouy,et al.  Genome-scale coestimation of species and gene trees , 2013, Genome research.

[22]  M. Springer,et al.  Concatenation versus coalescence versus “concatalescence” , 2013, Proceedings of the National Academy of Sciences.

[23]  Bronwen L. Aken,et al.  The draft genomes of soft–shell turtle and green sea turtle yield insights into the development and evolution of the turtle–specific body plan , 2013, Nature Genetics.

[24]  Sen Song,et al.  Reply to Gatesy and Springer: The multispecies coalescent model can effectively handle recombination and gene tree heterogeneity , 2013, Proceedings of the National Academy of Sciences of the United States of America.

[25]  The draft genomes of soft-shell turtle and green sea turtle yield insights into the development and evolution of the turtle-specific body plan. , 2013, Nature genetics.

[26]  Saravanaraj N. Ayyampalayam,et al.  Phylotranscriptomic analysis of the origin and early diversification of land plants , 2014, Proceedings of the National Academy of Sciences.

[27]  Tandy J. Warnow,et al.  ASTRAL: genome-scale coalescent-based species tree estimation , 2014, Bioinform..

[28]  A. von Haeseler,et al.  IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies , 2014, Molecular biology and evolution.

[29]  Stephen A. Smith,et al.  Analysis of phylogenomic datasets reveals conflict, concordance, and gene duplications with examples from animals and plants , 2015, BMC Evolutionary Biology.

[30]  Kazutaka Katoh,et al.  A simple method to control over-alignment in the MAFFT multiple sequence alignment program , 2016, Bioinform..

[31]  John A Rhodes,et al.  Split Scores: A Tool to Quantify Phylogenetic Signal in Genome‐Scale Data , 2016, Systematic biology.

[32]  A. Rokas,et al.  Contentious relationships in phylogenomic studies can be driven by a handful of genes , 2017, Nature Ecology &Evolution.

[33]  Joseph W. Brown,et al.  Phyx: phylogenetic tools for unix , 2017, Bioinform..

[34]  Nicolas Galtier,et al.  Incomplete Lineage Sorting in Mammalian Phylogenomics , 2016, Systematic biology.

[35]  Dan Liang,et al.  Phylogenomic Resolution of the Phylogeny of Laurasiatherian Mammals: Exploring Phylogenetic Signals within Coding and Noncoding Sequences , 2017, Genome biology and evolution.

[36]  Stephen A. Smith,et al.  Widespread paleopolyploidy, gene tree conflict, and recalcitrant relationships among the carnivorous Caryophyllales. , 2017, American journal of botany.

[37]  D. Wake,et al.  Phylogenomics reveals rapid, simultaneous diversification of three major clades of Gondwanan frogs at the Cretaceous–Paleogene boundary , 2017, Proceedings of the National Academy of Sciences.

[38]  Jeremy M. Brown,et al.  Bayes Factors Unmask Highly Variable Information Content, Bias, and Extreme Influence in Phylogenomic Analyses , 2016, Systematic biology.

[39]  Asif U. Tamuri,et al.  Alignment Modulates Ancestral Sequence Reconstruction Accuracy , 2018, Molecular biology and evolution.

[40]  John Gatesy,et al.  On the importance of homology in the age of phylogenomics , 2018 .

[41]  A. von Haeseler,et al.  UFBoot2: Improving the Ultrafast Bootstrap Approximation , 2017, bioRxiv.

[42]  Stephen A. Smith,et al.  Analyzing contentious relationships and outlier genes in phylogenomics , 2017, bioRxiv.

[43]  H. Lumbsch,et al.  Phylogenomic analysis of 2556 single-copy protein-coding genes resolves most evolutionary relationships for the major clades in the most diverse group of lichen-forming fungi , 2018, Fungal Diversity.

[44]  Nicolás Bellora,et al.  Comprehensive phylogeny of ray-finned fishes (Actinopterygii) based on transcriptomic and genomic data , 2018, Proceedings of the National Academy of Sciences.

[45]  Matthew G. Johnson,et al.  Resolution of the ordinal phylogeny of mosses using targeted exons from organellar and nuclear genomes , 2019, Nature Communications.

[46]  Fábio K Mendes,et al.  The perils of intralocus recombination for inferences of molecular convergence , 2018, bioRxiv.