Using data-display networks for exploratory data analysis in phylogenetic studies.

Exploratory data analysis (EDA) is a frequently undervalued part of data analysis in biology. It involves evaluating the characteristics of the data "before" proceeding to the definitive analysis in relation to the scientific question at hand. For phylogenetic analyses, a useful tool for EDA is a data-display network. This type of network is designed to display any character (or tree) conflict in a data set, without prior assumptions about the causes of those conflicts. The conflicts might be caused by 1) methodological issues in data collection or analysis, 2) homoplasy, or 3) horizontal gene flow of some sort. Here, I explore 13 published data sets using splits networks, as examples of using data-display networks for EDA. In each case, I performed an original EDA on the data provided, to highlight the aspects of the resulting network that will be important for an interpretation of the phylogeny. In each case, there is at least one important point (possibly missed by the original authors) that might affect the phylogenetic analysis. I conclude that EDA should play a greater role in phylogenetic analyses than it has done.

[1]  Hans-Jürgen Bandelt,et al.  A Relational Approach to Split Decomposition , 1993 .

[2]  D. Morrison,et al.  Networks in phylogenetic analysis: new tools for population biology. , 2005, International journal for parasitology.

[3]  G. Vasta,et al.  Identification of a Second rRNA Gene Unit in the Perkinsus andrewsi Genome , 2004, The Journal of eukaryotic microbiology.

[4]  T. Barraclough,et al.  Plant species-level systematics: new perspectives on pattern and process , 2005 .

[5]  Daniel H. Huson,et al.  Reconstruction of Reticulate Networks from Gene Trees , 2005, RECOMB.

[6]  Daniel H. Huson,et al.  Computing galled networks from real data , 2009, Bioinform..

[7]  Vincent Moulton,et al.  Using consensus networks to visualize contradictory evidence for species phylogeny. , 2004, Molecular biology and evolution.

[8]  L. Nakhleh Evolutionary Phylogenetic Networks: Models and Issues , 2010 .

[9]  V. Moulton,et al.  Neighbor-net: an agglomerative method for the construction of phylogenetic networks. , 2002, Molecular biology and evolution.

[10]  Michael T. Hallett,et al.  Towards Identifying Lateral Gene Transfer Events , 2002, Pacific Symposium on Biocomputing.

[11]  MOLECULAR PHYLOGEOGRAPHY, RETICULATION, AND LINEAGE SORTING IN MEDITERRANEAN SENECIO SECT. SENECIO (ASTERACEAE) , 2001, Evolution; international journal of organic evolution.

[12]  Vincent Moulton,et al.  Consensus Networks: A Method for Visualising Incompatibilities in Collections of Trees , 2003, WABI.

[13]  C. Richards,et al.  Distinguishing terminal monophyletic groups from reticulate taxa: performance of phenetic, tree-based, and network procedures. , 2007, Systematic biology.

[14]  Daniel H. Huson,et al.  Beyond Galled Trees - Decomposition and Computation of Galled Networks , 2007, RECOMB.

[15]  Faisal Ababneh,et al.  The biasing effect of compositional heterogeneity on phylogenetic estimates may be underestimated. , 2004, Systematic biology.

[16]  Taran Grant,et al.  Data exploration in phylogenetic inference: scientific, heuristic, or neither , 2003, Cladistics : the international journal of the Willi Hennig Society.

[17]  J. Ballard,et al.  When one is not enough: introgression of mitochondrial DNA in Drosophila. , 2000, Molecular biology and evolution.

[18]  J. McInerney,et al.  The Opisthokonta and the Ecdysozoa may not be clades: stronger support for the grouping of plant and animal than for animal and fungi and stronger support for the Coelomata than Ecdysozoa. , 2005, Molecular biology and evolution.

[19]  D. Morrison Phylogenetic tree-building. , 1996, International journal for parasitology.

[20]  Otto Opitz,et al.  Information and Classification , 1993 .

[21]  Daniel H. Huson,et al.  SplitsTree: analyzing and visualizing evolutionary data , 1998, Bioinform..

[22]  Falk Schreiber,et al.  Analysis of Biological Networks , 2008 .

[23]  Daniel H. Huson,et al.  Reducing Distortion in Phylogenetic Networks , 2006, WABI.

[24]  K. Crandall,et al.  Intraspecific gene genealogies: trees grafting into networks. , 2001, Trends in ecology & evolution.

[25]  Daniel H. Huson,et al.  Phylogenetic super-networks from partial trees , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[26]  M. Nishida,et al.  Phylogenetic relationships and ancient incomplete lineage sorting among cichlid fishes in Lake Tanganyika as revealed by analysis of the insertion of retroposons. , 2001, Molecular biology and evolution.

[27]  K. McBreen,et al.  Reconstructing reticulate evolutionary histories of plants. , 2006, Trends in plant science.

[28]  Sarah C. Ayling,et al.  Novel methodology for construction and pruning of quasi-median networks , 2008, BMC Bioinformatics.

[29]  Vladimir Makarenkov,et al.  Phylogenetic Network Construction Approaches , 2006 .

[30]  Jessica Gurevitch,et al.  Design and Analysis of Ecological Experiments , 1993 .

[31]  D. Hillis,et al.  Phylogeny of the New World true frogs (Rana). , 2005, Molecular phylogenetics and evolution.

[32]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[33]  David Posada,et al.  ProtTest: selection of best-fit models of protein evolution , 2005, Bioinform..

[34]  D. Morrison,et al.  Global patterns reveal strong population structure in Haemonchus contortus, a nematode parasite of domesticated ruminants. , 2006, International journal for parasitology.

[35]  D. Morrison,et al.  Population genetics of the bovine/cattle lungworm (Dictyocaulus viviparus) based on mtDNA and AFLP marker techniques , 2006, Parasitology.

[36]  Hervé Philippe,et al.  Horizontal gene transfer and phylogenetics. , 2003, Current opinion in microbiology.

[37]  R. Ward,et al.  Complete mitochondrial genome sequences of two extinct moas clarify ratite evolution , 2001, Nature.

[38]  Frédéric Delsuc,et al.  Visualizing conflicting evolutionary hypotheses in large collections of trees: using consensus networks to study the origins of placentals and hexapods. , 2005, Systematic biology.

[39]  Vincent Moulton,et al.  NeighborNet: An Agglomerative Method for the Construction of Planar Phylogenetic Networks , 2002, WABI.

[40]  D. Morrison,et al.  Evolution of the genus Leishmania revealed by comparison of DNA and RNA polymerase gene sequences. , 1997, Molecular and biochemical parasitology.

[41]  A. Dress,et al.  Split decomposition: a new and useful approach to phylogenetic analysis of distance data. , 1992, Molecular phylogenetics and evolution.

[42]  D. Huson,et al.  Application of phylogenetic networks in evolutionary studies. , 2006, Molecular biology and evolution.

[43]  Christoph Mayer,et al.  Visualizing differences in phylogenetic information content of alignments and distinction of three classes of long-branch effects , 2007, BMC Evolutionary Biology.

[44]  Daniel H. Huson,et al.  Computing recombination networks from binary sequences , 2005, ECCB/JBI.

[45]  R. Gray,et al.  Untangling long branches: identifying conflicting phylogenetic signals using spectral analysis, neighbor-net, and consensus networks. , 2005, Systematic biology.

[46]  Tandy J. Warnow,et al.  Towards the Development of Computational Tools for Evaluating Phylogenetic Network Reconstruction Methods , 2002, Pacific Symposium on Biocomputing.

[47]  J. Bergsten A review of long‐branch attraction , 2005, Cladistics : the international journal of the Willi Hennig Society.

[48]  Vincent Moulton,et al.  Proceedings of the SMBE Tri-National Young Investigators' Workshop 2005. Improved consensus network techniques for genome-scale phylogeny. , 2006, Molecular biology and evolution.

[49]  Kristen E. DiCerbo,et al.  Exploratory Data Analysis , 2003 .

[50]  B. Holland,et al.  Analysis of Acorus calamus chloroplast genome and its phylogenetic implications. , 2005, Molecular biology and evolution.

[51]  H. Bandelt,et al.  Median networks: speedy construction and greedy reduction, one simulation, and two case studies from human mtDNA. , 2000, Molecular phylogenetics and evolution.

[52]  Luay Nakhleh,et al.  Phylogenetic networks , 2004 .

[53]  Ichael,et al.  Viburnum Phylogeny Based on Chloroplast trnK Intron and Nuclear Ribosomal ITS DNA Sequences , 2004 .

[54]  F. Bakker,et al.  Reconstructing patterns of reticulate evolution in angiosperms: what can we do? , 2005 .