Identifying localized biases in large datasets: a case study using the avian tree of life.

Large-scale multi-locus studies have become common in molecular phylogenetics, with new studies continually adding to previous datasets in an effort to fully resolve the tree of life. Total evidence analyses that combine existing data with newly collected data are expected to increase the power of phylogenetic analyses to resolve difficult relationships. However, they might be subject to localized biases, with one or a few loci having a strong and potentially misleading influence upon the results. To examine this possibility we combined a newly collected 31-locus dataset that includes representatives of all major avian lineages with a published dataset of 19 loci that has a comparable number of sites (Hackett et al., 2008. Science 320, 1763-1768). This allowed us to explore the advantages of conducting total evidence analyses, and to determine whether it was also important to analyze new datasets independent of published ones. The total evidence analysis yielded results very similar to the published results, with only slightly increased support at a few nodes. However, analyzing the 31- and 19-locus datasets separately highlighted several differences. Two clades received strong support in the published dataset and total evidence analysis, but the support appeared to reflect bias at a single locus (β-fibrinogen [FGB]). The signal in FGB that supported these relationships was sufficient to result in their recovery with bootstrap support, even when combined with 49 loci lacking that signal. FGB did not appear to have a substantial impact upon the results of species tree methods, but another locus (brain-derived neurotrophic factor [BDNF]) did have an impact upon those analyses. These results demonstrated that localized biases can influence large-scale phylogenetic analyses but they also indicated that considering independent evidence and exploring multiple analytical approaches could reveal them.

[1]  A. von Haeseler,et al.  IQPNNI: moving fast through tree space and stopping in time. , 2004, Molecular biology and evolution.

[2]  E. Braun,et al.  Examining Basal avian divergences with mitochondrial sequences: model complexity, taxon sampling, and sequence length. , 2002, Systematic biology.

[3]  Dan Liang,et al.  The Development of Three Long Universal Nuclear Protein-Coding Locus Markers and Their Application to Osteichthyan Phylogenetics with Nested PCR , 2012, PloS one.

[4]  PARALLEL RADIATIONS IN THE PRIMARY CLADES OF BIRDS , 2004 .

[5]  Kazutaka Katoh,et al.  Multiple alignment of DNA sequences with MAFFT. , 2009, Methods in molecular biology.

[6]  J. Bull,et al.  An Empirical Test of Bootstrapping as a Method for Assessing Confidence in Phylogenetic Analysis , 1993 .

[7]  Kevin J. Liu,et al.  Multiple sequence alignment: a major challenge to large-scale , 2010 .

[8]  Shigenori Maruyama,et al.  Retroposon analysis and recent geological data suggest near-simultaneous divergence of the three superorders of mammals , 2009, Proceedings of the National Academy of Sciences.

[9]  David Q. Matus,et al.  Broad phylogenomic sampling improves resolution of the animal tree of life , 2008, Nature.

[10]  M. Braun,et al.  A well-tested set of primers to amplify regions spread across the avian genome , 2009 .

[11]  Edward L. Braun,et al.  Phylogenomic evidence for multiple losses of flight in ratite birds , 2008, Proceedings of the National Academy of Sciences.

[12]  Maryse Condé Tree of Life , 1992 .

[13]  Albert J. Vilella,et al.  The genome of a songbird , 2010, Nature.

[14]  Gina Cannarozzi,et al.  Finding the balance between the mathematical and biological optima in multiple sequence alignment , 2010 .

[15]  S. Creer Choosing and Using Introns in Molecular Phylogenetics , 2007, Evolutionary bioinformatics online.

[16]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[17]  Corinne Da Silva,et al.  Phylogenomics Revives Traditional Views on Deep Animal Relationships , 2009, Current Biology.

[18]  B. Schierwater,et al.  Concatenated Analysis Sheds Light on Early Metazoan Evolution and Fuels a Modern “Urmetazoon” Hypothesis , 2009, PLoS biology.

[19]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[20]  Jürgen Brosius,et al.  Mosaic retroposon insertion patterns in placental mammals. , 2009, Genome research.

[21]  P. Waddell,et al.  A phylogenetic foundation for comparative mammalian genomics. , 2001, Genome informatics. International Conference on Genome Informatics.

[22]  Travis C. Glenn,et al.  A Phylogeny of Birds Based on Over 1,500 Loci Collected by Target Enrichment and High-Throughput Sequencing , 2012, PloS one.

[23]  D. Penny,et al.  Toward resolving deep neoaves phylogeny: data, signal enhancement, and priors. , 2009, Molecular biology and evolution.

[24]  Edward L. Braun,et al.  Parsimony and Model-Based Analyses of Indels in Avian Nuclear Genes Reveal Congruent and Incongruent Phylogenetic Signals , 2013, Biology.

[25]  Robert C Thomson,et al.  Sparse supermatrices for phylogenetic inference: taxonomy, alignment, rogue taxa, and the phylogeny of living turtles. , 2010, Systematic biology.

[26]  J. Brosius,et al.  Retroposon insertion patterns of neoavian birds: strong evidence for an extensive incomplete lineage sorting era. , 2012, Molecular biology and evolution.

[27]  F. Delsuc,et al.  Phylogenomics: the beginning of incongruence? , 2006, Trends in genetics : TIG.

[28]  G. Mayr Metaves, Mirandornithes, Strisores and other novelties – a critical review of the higher‐level phylogeny of neornithine birds , 2010 .

[29]  A. von Haeseler,et al.  Identifying site-specific substitution rates. , 2003, Molecular biology and evolution.

[30]  Steven Poe,et al.  BIRDS IN A BUSH: FIVE GENES INDICATE EXPLOSIVE EVOLUTION OF AVIAN ORDERS , 2004, Evolution; international journal of organic evolution.

[31]  M. Batzer,et al.  SINEs of a nearly perfect character. , 2006, Systematic biology.

[32]  Robert C. Edgar,et al.  Multiple sequence alignment. , 2006, Current opinion in structural biology.

[33]  J. Oliver MICROEVOLUTIONARY PROCESSES GENERATE PHYLOGENOMIC DISCORDANCE AT ANCIENT DIVERGENCES , 2013, Evolution; international journal of organic evolution.

[34]  J. Ligon Relationships of the cathartid vultures , 1967 .

[35]  Jordan V Smith,et al.  Ratite nonmonophyly: independent evidence from 40 novel Loci. , 2013, Systematic biology.

[36]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[37]  M. Kiefmann,et al.  Mesozoic retroposons reveal parrots as the closest living relatives of passerine birds , 2011, Nature communications.

[38]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[39]  Jerzy Jurka,et al.  Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor , 2006, BMC Bioinformatics.

[40]  L. Kubatko,et al.  Inconsistency of phylogenetic estimates from concatenated data under coalescence. , 2007, Systematic biology.

[41]  Olivier Gascuel,et al.  Genomics, biogeography, and the diversification of placental mammals , 2007, Proceedings of the National Academy of Sciences.

[42]  E. Braun,et al.  Introns outperform exons in analyses of basal avian phylogeny using clathrin heavy chain genes. , 2008, Gene.

[43]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[44]  E. Braun,et al.  From Reptilian Phylogenomics to Reptilian Genomes: Analyses of c-Jun and DJ-1 Proto-Oncogenes , 2010, Cytogenetic and Genome Research.

[45]  Rob DeSalle,et al.  Resolution of a supertree/supermatrix paradox. , 2002, Systematic biology.

[46]  E. Braun,et al.  A Macroevolutionary Perspective on Multiple Sexual Traits in the Phasianidae (Galliformes) , 2011, International journal of evolutionary biology.

[47]  Seung-Jin Sul,et al.  An Experimental Analysis of Robinson-Foulds Distance Matrix Algorithms , 2008, ESA.

[48]  H. Tempest,et al.  The evolution of the avian genome as revealed by comparative molecular cytogenetics , 2007, Cytogenetic and Genome Research.

[49]  J. Jurka,et al.  Repbase Update, a database of eukaryotic repetitive elements , 2005, Cytogenetic and Genome Research.

[50]  D. Penny,et al.  Bird evolution: testing the Metaves clade with six new mitochondrial genomes , 2008, BMC Evolutionary Biology.

[51]  M. Braun,et al.  Homoplastic microinversions and the avian tree of life , 2011, BMC Evolutionary Biology.

[52]  J. Ahlquist,et al.  The Birds Reclassified. (Book Reviews: Phylogeny and Classification of Birds. A Study in Molecular Evolution.) , 1991 .

[53]  Edward L. Braun,et al.  A multigene phylogeny of Galliformes supports a single origin of erectile ability in non-feathered facial traits , 2008 .

[54]  D. Pearl,et al.  Estimating species phylogenies using coalescence times among sequences. , 2009, Systematic biology.

[55]  Axel Janke,et al.  Mammalian Evolution May not Be Strictly Bifurcating , 2010, Molecular biology and evolution.

[56]  T. Parsons,et al.  Diversification of Neoaves: integration of molecular sequence data and fossils , 2006, Biology Letters.

[57]  Liang Liu,et al.  STRAW: Species TRee Analysis Web server , 2013, Nucleic Acids Res..

[58]  Liang Liu,et al.  Estimating species trees from unrooted gene trees. , 2011, Systematic biology.

[59]  M. Braun,et al.  Are transposable element insertions homoplasy free?: an examination using the avian tree of life. , 2011, Systematic biology.

[60]  D. Penny,et al.  Genome-scale phylogeny and the detection of systematic biases. , 2004, Molecular biology and evolution.

[61]  N. Rosenberg,et al.  Discordance of Species Trees with Their Most Likely Gene Trees , 2006, PLoS genetics.

[62]  A. Baker,et al.  Multiple nuclear genes and retroposons support vicariance and dispersal of the palaeognaths, and an Early Cretaceous origin of modern birds , 2012, Proceedings of the Royal Society B: Biological Sciences.

[63]  Kevin J. Liu,et al.  Multiple sequence alignment: a major challenge to large-scale phylogenetics. , 2010, PLoS currents.

[64]  J A Lake,et al.  The order of sequence alignment can bias the selection of tree topology. , 1991, Molecular biology and evolution.

[65]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[66]  W. A. Cox,et al.  A Phylogenomic Study of Birds Reveals Their Evolutionary History , 2008, Science.

[67]  H. Philippe,et al.  Resolving Difficult Phylogenetic Questions: Why More Sequences Are Not Enough , 2011, PLoS biology.

[68]  E. Braun,et al.  Testing hypotheses about the sister group of the passeriformes using an independent 30-locus data set. , 2012, Molecular biology and evolution.

[69]  J. Dumbacher,et al.  Adenylate Kinase Intron 5: A New Nuclear Locus for Avian Systematics , 2001 .

[70]  Sudhir Kumar,et al.  Evolution of modern birds revealed by mitogenomics: timing the radiation and origin of major orders. , 2011, Molecular biology and evolution.