THE AUTOMATION AND EVALUATION OF NESTED CLADE PHYLOGEOGRAPHIC ANALYSIS

Abstract Nested clade phylogeographic analysis (NCPA) is a popular method for reconstructing the demographic history of spatially distributed populations from genetic data. Although some parts of the analysis are automated, there is no unique and widely followed algorithm for doing this in its entirety, beginning with the data, and ending with the inferences drawn from the data. This article describes a method that automates NCPA, thereby providing a framework for replicating analyses in an objective way. To do so, a number of decisions need to be made so that the automated implementation is representative of previous analyses. We review how the NCPA procedure has evolved since its inception and conclude that there is scope for some variability in the manual application of NCPA. We apply the automated software to three published datasets previously analyzed manually and replicate many details of the manual analyses, suggesting that the current algorithm is representative of how a typical user will perform NCPA. We simulate a large number of replicate datasets for geographically distributed, but entirely random-mating, populations. These are then analyzed using the automated NCPA algorithm. Results indicate that NCPA tends to give a high frequency of false positives. In our simulations we observe that 14% of the clades give a conclusive inference that a demographic event has occurred, and that 75% of the datasets have at least one clade that gives such an inference. This is mainly due to the generation of multiple statistics per clade, of which only one is required to be significant to apply the inference key. We survey the inferences that have been made in recent publications and show that the most commonly inferred processes (restricted gene flow with isolation by distance and contiguous range expansion) are those that are commonly inferred in our simulations. However, published datasets typically yield a richer set of inferences with NCPA than obtained in our random-mating simulations, and further testing of NCPA with models of structured populations is necessary to examine its accuracy.

[1]  J. Cornuet,et al.  Estimating admixture proportions with microsatellites: comparison of methods based on simulated data , 2004, Molecular ecology.

[2]  Peter Beerli,et al.  Comparison of Bayesian and maximum-likelihood inference of population genetic parameters , 2006, Bioinform..

[3]  LINEAR HABITATS AND THE NESTED CLADE ANALYSIS: AN EMPIRICAL EVALUATION OF GEOGRAPHIC VERSUS RIVER DISTANCES USING AN OZARK CRAYFISH (DECAPODA: CAMBARIDAE) , 2003, Evolution; international journal of organic evolution.

[4]  Vincent Danjean,et al.  On the use of haplotype phylogeny to detect disease susceptibility loci , 2005, BMC Genetics.

[5]  K. Crandall,et al.  GeoDis: a program for the cladistic nested analysis of the geographical distribution of genetic haplotypes , 2000, Molecular ecology.

[6]  Alan R Templeton,et al.  Statistical phylogeography: methods of evaluating and minimizing inference errors , 2004, Molecular ecology.

[7]  K. Crandall,et al.  TCS: a computer program to estimate gene genealogies , 2000, Molecular ecology.

[8]  K. Crandall,et al.  Empirical tests of some predictions from coalescent theory with applications to intraspecific phylogeny reconstruction. , 1993, Genetics.

[9]  C. Gissi,et al.  Nucleotide Substitution Rate of Mammalian Mitochondrial Genomes , 1999, Journal of Molecular Evolution.

[10]  P. Mardulyn Phylogeography of the Vosges mountains populations of Gonioctena pallida (Coleoptera: Chrysomelidae): a nested clade analysis of mitochondrial DNA haplotypes , 2001, Molecular ecology.

[11]  Laurent Excoffier,et al.  Arlequin (version 3.0): An integrated software package for population genetics data analysis , 2005, Evolutionary bioinformatics online.

[12]  Andrew P Morris,et al.  Linkage disequilibrium mapping via cladistic analysis of single-nucleotide polymorphism haplotypes. , 2004, American journal of human genetics.

[13]  D. Roff,et al.  The statistical analysis of mitochondrial DNA polymorphisms: chi 2 and the problem of small samples. , 1989, Molecular biology and evolution.

[14]  autoinfer1.0: a computer program to infer biogeographical events automatically , 2006 .

[15]  J. Gómez‐Zurita,et al.  Sequence, secondary structure and phylogenetic analyses of the ribosomal internal transcribed spacer 2 (ITS2) in the Timarcha leaf beetles (Coleoptera: Chrysomelidae) , 2000, Insect molecular biology.

[16]  K. Crandall,et al.  Nested clade analysis statistics , 2006 .

[17]  A. Zhang,et al.  Species status and phylogeography of two closely related Coptolabrus species (Coleoptera: Carabidae) in South Korea inferred from mitochondrial and nuclear gene sequences , 2005, Molecular ecology.

[18]  Alan R. Templeton,et al.  Tree Scanning , 2005, Genetics.

[19]  E. Routman,et al.  Population genetic structure of the toad Bufo woodhousii: an empirical assessment of the effects of haplotype extinction on nested cladistic analysis , 2003, Molecular ecology.

[20]  A. Vogler,et al.  Incongruent nuclear and mitochondrial phylogeographic patterns in the Timarcha goettingensis species complex (Coleoptera, Chrysomelidae) , 2003, Journal of evolutionary biology.

[21]  E. Boerwinkle,et al.  A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping. I. Basic theory and an analysis of alcohol dehydrogenase activity in Drosophila. , 1987, Genetics.

[22]  J. Trexler,et al.  Nested cladistic analysis indicates population fragmentation shapes genetic diversity in a freshwater mussel. , 2000, Genetics.

[23]  Alan R Templeton,et al.  Haplotype trees and modern human origins. , 2005, American journal of physical anthropology.

[24]  K. Roeder,et al.  Transmission/disequilibrium test meets measured haplotype analysis: family-based association analysis guided by evolution of haplotypes. , 2001, American journal of human genetics.

[25]  Mahesh Panchal,et al.  The automation of Nested Clade Phylogeographic Analysis , 2007, Bioinform..

[26]  M. Whitlock,et al.  Estimating effective population size and migration rates from genetic samples over space and time. , 2003, Genetics.

[27]  H. Bandelt,et al.  Median-joining networks for inferring intraspecific phylogenies. , 1999, Molecular biology and evolution.

[28]  C. Sing,et al.  A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping and DNA sequence data. III. Cladogram estimation. , 1992, Genetics.

[29]  Patrick Mardulyn,et al.  Evaluating intraspecific "network" construction methods using simulated sequence data: do existing algorithms outperform the global maximum parsimony approach? , 2005, Systematic biology.

[30]  David Posada,et al.  PHYLOGEOGRAPHIC HISTORY OF THE LAND SNAIL CANDIDULA UNIFASCIATA (HELICELLINAE, STYLOMMATOPHORA): FRAGMENTATION, CORRIDOR MIGRATION, AND SECONDARY CONTACT , 2002, Evolution; international journal of organic evolution.

[31]  A. Templeton,et al.  Out of Africa again and again , 2002, Nature.

[32]  A. Templeton,et al.  Root probabilities for intraspecific gene trees under neutral coalescent theory. , 1994, Molecular phylogenetics and evolution.

[33]  C. Sing,et al.  A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping. IV. Nested analyses with cladogram uncertainty and recombination. , 1993, Genetics.

[34]  Stefan Schneider,et al.  Arlequin (version 3.0): An integrated software package for population genetics data analysis , 2005 .

[35]  K. Crandall,et al.  Multiple interspecies transmissions of human and simian T-cell leukemia/lymphoma virus type I sequences. , 1996, Molecular biology and evolution.

[36]  A. Templeton,et al.  Separating population structure from population history: a cladistic analysis of the geographical distribution of mitochondrial DNA haplotypes in the tiger salamander, Ambystoma tigrinum. , 1995, Genetics.

[37]  A. Templeton “Optimal” Randomization Strategies When Testing the Existence of a Phylogeographic Structure: A Reply to Petit and Grivet , 2002 .

[38]  Yogendra P. Chaubey Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[39]  R. Petit,et al.  Optimal randomization strategies when testing the existence of a phylogeographic structure. , 2002, Genetics.

[40]  C. Sing,et al.  A cladistic analysis of phenotype associations with haplotypes inferred from restriction endonuclease mapping. II. The analysis of natural populations. , 1988, Genetics.

[41]  Templeton,et al.  Nested clade analyses of phylogeographic data: testing hypotheses about gene flow and population history , 1998, Molecular ecology.

[42]  ABDOMINAL PIGMENTATION VARIATION IN DROSOPHILA POLYMORPHA: GEOGRAPHIC VARIATION IN THE TRAIT, AND UNDERLYING PHYLOGEOGRAPHY , 2005, Evolution; international journal of organic evolution.

[43]  L. Excoffier,et al.  SIMCOAL: a general coalescent program for the simulation of molecular data in interconnected populations with arbitrary demography. , 2000, The Journal of heredity.

[44]  A. Templeton USES OF EVOLUTIONARY THEORY IN THE HUMAN GENOME PROJECT , 1999 .

[45]  M. Stoneking,et al.  Mitochondrial DNA and human evolution , 1987, Nature.

[46]  Alan R. Templeton,et al.  TreeScan: a bioinformatic application to search for genotype/phenotype associations using haplotype trees , 2005, Bioinform..

[47]  J. Gómez‐Zurita,et al.  The evolutionary history of the genus Timarcha (Coleoptera, Chrysomelidae) inferred from mitochondrial COII gene and partial 16S rDNA sequences. , 2000, Molecular phylogenetics and evolution.

[48]  K. Crandall,et al.  Intraspecific gene genealogies: trees grafting into networks. , 2001, Trends in ecology & evolution.

[49]  W. Jordan,et al.  Using nested clade analysis to assess the history of colonization and the persistence of populations of an Iberian Lizard , 2002, Molecular ecology.

[50]  Jody Hey,et al.  The study of structured populations — new hope for a difficult and divided science , 2003, Nature Reviews Genetics.

[51]  J. Neigel,et al.  Intraspecific Phylogeography: The Mitochondrial DNA Bridge Between Population Genetics and Systematics , 1987 .