—Conserved genes have found their way into the mainstream of molecular systematics. Many of these genes are members of multigene families. A difculty with using single genes of multigene families for phylogenetic inference is that genes from one species may be paralogous to those from another taxon. We focus attention on this problem using heat shock 70 (HSP70) genes. Using polymerase chain reaction techniques with genomic DNA, we isolated and sequenced 123 distinct sequences from 12 species of sharks. Phylogenetic analysis indicated that the sequences cluster with constituitively expressed cytoplasmic heat shock–like genes. Three highly divergent gene clades were sampled. A number of similar sequences were sampled from each species within each distinct gene clade. Comparison of published species trees with an HSP70 gene tree inferred using Bayesian phylogenetic analysis revealed several cases of gene duplication and differential sorting of gene lineages within this group of sharks. Gene tree parsimony based on the objective criteria of duplication and losses showed that previously published hypotheses of species relationships and two novel hypothesis based on Bayesian phylogenetics were concordant with the history of HSP70 gene duplication and loss. By contrast, two published hypotheses based on morphological data were not signicantly different from the null hypothesis of a random association between species relatedness and the HSP70 gene tree. These results suggest that gene tree parsimony using data from multigene families can be used for inferring species relationships or testing published alternative hypotheses. More importantly, the results suggest that systematic studies relying on phylogenetic inferences from HSP70 genes may by plagued by unrecognized paralogy of sampled genes. Our results underscore the distinction between gene and species trees and highlight an underappreciated source of discordance between gene trees and organismal phylogeny, i.e., unrecognized paralogy of sampled genes. [Gene tree parsimony; HSP70; molecular systematics; multigene families; orthology; paralogy; sharks.] Phylogenetic trees inferred from different genes are often dissimilar (Hasegawa et al., 1992; Ruvolo, 1997; Brown et al., 2001). Lack of concordance may be due to confounding effects of homoplasy (Adachi and Hasegawa, 1995; Naylor and Brown, 1998) or different gene phylogenies due to differential sorting or horizontal gene transfer (Wu, 1991; Hudson, 1992; Page, 1994; Maddison, 1996, 1997). Another source of discordance between gene trees and species trees is paralogy of sampled genes (Goodman et al., 1979; Page, 1994; Maddison, 1997; Page and Charleston, 1997; Slowinski and Page, 1999). It is generally true that most genes in the nuclear genome are members of multigene families (Henikoff et al., 1997; Slowinski and Page, 1997: Fig. 2). Gene families typically evolve by a process of birth and death of gene lineages (Ota and Nei, 1994; Walsh, 1995; Nei et al., 1997). Genes are born by gene duplication from unequal crossing over, replicative transposition, or polyploidization and are lost by deletion mutations or are gradually silenced by the accumulation of deleterious mutations (Li and Graur, 1991; Rowen et al., 1996; Sidow, 1996; Wolfe and Shields, 1997). Gene family complexity (the number of different genes) is the result of the difference between the birth and death rates of individual genes. Birth rates of gene duplication in populations can be high, ranging from 10¡8 to 10¡3 duplications per locus per generation (Gelbart and Chovnick, 1979; Shapira and Finnerty, 1986; Fryxell, 1996; Lynch and Conery, 2000). Such high rates of gene duplication, in contrast with the relative conservation of gene functions and of the primary structure of most genes, suggest that turnover of individual genes within a family of related genes can be rapid. One consequence of this process is that assumptions of orthology for isolated genes may be questionable, and phylogenetic inference from nuclear genes may be subject to errors from unrecognized paralogy. For example, gene duplication may occur such that ancestral species possess two paralogous genes copies, but the paralogues may undergo differential loss (or decay) in descendent species, yielding a gene tree that is incongruent with the species tree (Fig. 1). Failure to sample orthologous genes can also stem from methodological bias (Fig. 1) either 570 D ow naded rom http/academ ic.p.com /sysbio/articlact/51/4/570/1698769 by gest on 07 Feruary 2019 2002 MARTIN AND BURG—PHYLOGENETICS USING MULTIGENE FAMILIES 571 FIGURE 1. Two ways a gene tree can provide a misleading estimate of a species tree due to paralogy. The species tree is indicated by the solid tree outline, and the gene lineages are contained with the species tree vessel. The black circle marks a gene duplication event, resulting in two gene lineages. The left diagram shows differential gene extinction (indicated by the daggers) such that the estimated species tree is incorrect. Similarly, all three species may retain both paralogous genes, but different paralogs were sampled (indicated by the arrows on the right diagram). As in the rst case, the resulting species tree is incorrect. because all paralogous genes were not characterized or because the particular method of isolating genes was biased for some members of the gene family (e.g., see Wagner et al., 1994). Similarly, failure to sample orthologous genesmay result in inaccurate estimates of divergence time between species because the genes record the timing of gene duplication and not necessarily species divergence. These considerations underscore the possibility that the accuracy of phylogenetic inference may be compromised by errors in assigning orthology. In some cases it is not possible to ascertain orthology with condence, and researchers have attempted to extract information about relationships among taxa from only those genes of a multigene family tree assumed to be orthologous (Livak et al., 1995). Ruvolo and Koh (1996) noted, however, that a gene tree should be evaluated in its entirety as an estimator of the species tree, implying that focusing on putative orthologous genes while ignoring other genes is not legitimate. Nonindependence of genes within a gene family may also interfere with estimation of the evolutionary history of species; for instance, tandemly repeated genes can undergo gene conversion. Sanderson and Doyle (1992) showed that gene families with intermediate rates of gene conversion have the highest levels of homoplasy and the largest numbers of equally parsimonious trees, with the lowest bootstrap values. Recognition of these problems associated with nuclear genes has led a number of researchers to search for so-called single-copy genes. Bona de examples of single-copy genes known to be orthologous in different vertebrate taxa are rare. Even genes encoded by the mitochondrial genome often exist as members of multigene families because of historical transfers of genes from the mitochondrial genome to the nucleus (Bensasson et al., 2001). Thus, the possibility exists that all genes are currently or have been members of multigene families, and even if genes exist as single copies in the genomes of taxa under investigation, the sampled genes may be paralogous. Problems associated with unrecognized paralogy can be avoided by simply not making the assumption that sampled genes are orthologous. Instead, the choice among rival phylogenetic hypotheses can be based on the t of the gene tree to alternative species trees (Goodman et al., 1979; Page, 1994; Page and Charleston, 1997; Slowinksi et al., 1997). Page (1994) referred to this procedure as reconciliation, and Slowinski and Page (1999) noted that this process is analogous to optimizing individual characters on trees. If there is perfect agreement between the gene tree and species tree (compare the gene tree and hypothesis 1 in Fig. 2), then the gene tree perfectly reconciles with the species tree and it is possible to extract the history of gene duplication and speciation directly from the gene tree. If the match between gene tree and organismal tree is not perfect (compare the gene tree and hypothesis 2 in Fig. 2), then the reconciled tree will be different from the gene tree mainly because it is necessary to postulate additional gene duplications and losses. Using this approach, it is possible to choose between the two alternative hypotheses of phylogenetic relationships among species depicted in Figure 2 using parsimony as an objective criterion. Slowinski and Page (1999) referred to this approach as gene tree parsimony. In this particular case, hypothesis 2 would be rejected in D ow naded rom http/academ ic.p.com /sysbio/articlact/51/4/570/1698769 by gest on 07 Feruary 2019 572 SYSTEMATIC BIOLOGY VOL. 51 FIGURE 2. Differences in evolutionary history revealed by reconciling a gene tree with two different hypotheses of relationships among species A, B, and C. Numbers correspond to different genes. Open circles denote duplication events. Solid branches represent sampled lineages, and shaded branches represent missing (or extinct) lineages. favor ofhypothesis 1because hypothesis 2 requires an additional duplication and four instances of gene loss relative to hypothesis 1. There are a variety of gene families that have diversied during the last 500 million years of chordate evolution, and many of these gene families would be suitable candidates for exploring the utility and accuracy of inferring phylogenetic relationships of vertebrates from complex gene trees. Here, we focus on the analysis of the heat shock 70 (HSP70) gene family. HSP70 is an ideal choice because much is known about the structure, function, phylogeny, and evolution of these genes (Lindquist and Craig, 1988; Morimoto et al., 1994; James et al., 1997; Feder and Hofmann, 1999). Moreover, HSP70 genes have been used for inferr
[1]
J. Gulick.
Divergent evolution and the Darwinian theory
,
1890,
American Journal of Science.
[2]
W. Gelbart,et al.
Spontaneous unequal exchange in the rosy region of Drosophila melanogaster.
,
1979,
Genetics.
[3]
G. Moore,et al.
Fitting the gene lineage into its species lineage
,
1979
.
[4]
J. Sambrook,et al.
Molecular Cloning: A Laboratory Manual
,
2001
.
[5]
L. Compagno.
Relationships of the megamouth shark, Megachasma pelagios (Lamniformes, Megachasmidae), with comments on its feeding habits
,
1990
.
[6]
S. Lindquist,et al.
Heat Shock
,
1991,
Springer Berlin Heidelberg.
[7]
M. Sanderson,et al.
RECONSTRUCTION OF ORGANISMAL AND GENE PHYLOGENIES FROM DATA ON MULTIGENE FAMILIES: CONCERTED EVOLUTION, HOMOPLASY, AND CONFIDENCE
,
1992
.
[8]
J. W. Pendleton,et al.
Surveys of Gene Families Using Polymerase Chain Reaction: PCR Selection and PCR Drift
,
1994
.
[9]
R. Page.
Maps between trees and cladistic analysis of historical associations among genes
,
1994
.
[10]
M. Moltó,et al.
Phylogenetic relationships between Drosophila subobscura, D. guanche and D. madeirensis based on Southern analysis of heat shock genes.
,
2004,
Hereditas.
[11]
R. Gupta,et al.
Cloning of Giardia lamblia heat shock protein HSP70 homologs: implications regarding origin of eukaryotic cells and of endoplasmic reticulum.
,
1994,
Proceedings of the National Academy of Sciences of the United States of America.
[12]
K. Livak,et al.
Variability of dopamine D4 receptor (DRD4) gene sequence within and among nonhuman primate species.
,
1995,
Proceedings of the National Academy of Sciences of the United States of America.
[13]
J. B. Walsh,et al.
How often do duplicated genes evolve new functions?
,
1995,
Genetics.
[14]
M. Hasegawa,et al.
Phylogeny of whales: dependence of the inference on species sampling.
,
1995,
Molecular biology and evolution.
[15]
Re: The questionable implications of the dopamine D4 receptor (DRD4) gene tree for primate phylogeny.
,
1996,
Molecular phylogenetics and evolution.
[16]
CHAPTER 5 – Evolutionary Relationships of the White Shark: A Phylogeny of Lamniform Sharks Based on Dental Morphology
,
1996
.
[17]
A. Sidow.
Gen(om)e duplications in the evolution of early vertebrates.
,
1996,
Current opinion in genetics & development.
[18]
K. J. Fryxell,et al.
The coevolution of gene family trees.
,
1996,
Trends in genetics : TIG.
[19]
J. Graves,et al.
Genetic population structure of the shortfin mako (Isurus oxyrinchus) inferred from restriction fragment length polymorphism analysis of mitochondrial DNA
,
1996
.
[20]
W. Maddison.
Gene Trees in Species Trees
,
1997
.
[21]
M. Ruvolo,et al.
Molecular phylogeny of the hominoids: inferences from multiple independent DNA sequence data sets.
,
1997,
Molecular biology and evolution.
[22]
A. Knight,et al.
Inferring species trees from gene trees: a phylogenetic analysis of the Elapidae (Serpentes) based on the amino acid sequences of venom proteins.
,
1997,
Molecular phylogenetics and evolution.
[23]
K. H. Wolfe,et al.
Molecular evidence for an ancient duplication of the entire yeast genome
,
1997,
Nature.
[24]
R. Page,et al.
From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem.
,
1997,
Molecular phylogenetics and evolution.
[25]
M. Nei,et al.
Evolution by the birth-and-death process in multigene families of the vertebrate immune system.
,
1997,
Proceedings of the National Academy of Sciences of the United States of America.
[26]
E. Buckler,et al.
The evolution of ribosomal DNA: divergent paralogues and phylogenetic implications.
,
1997,
Genetics.
[27]
L. Hood,et al.
Gene families: the taxonomy of protein paralogs and chimeras.
,
1997,
Science.
[28]
Andrew P. Martin,et al.
CHAPTER 13 – Interrelationships of Lamniform Sharks: Testing Phylogenetic Hypotheses with Sequence Data
,
1997
.
[29]
B. Gaut,et al.
DNA sequence evidence for the segmental allotetraploid origin of maize.
,
1997,
Proceedings of the National Academy of Sciences of the United States of America.
[30]
Roderic D. M. Page,et al.
GeneTree: comparing gene and species phylogenies using reconciled trees
,
1998,
Bioinform..
[31]
David Posada,et al.
MODELTEST: testing the model of DNA substitution
,
1998,
Bioinform..
[32]
H. Philippe,et al.
New insights into the phylogeny of eukaryotes based on ciliate Hsp70 sequences.
,
1998,
Molecular biology and evolution.
[33]
G. Buck,et al.
The HSP70 Gene Family in Pneumocystis carinii: Molecular and Phylogenetic Characterization of Cytoplasmic Members
,
1998,
The Journal of eukaryotic microbiology.
[34]
Population structure of the Australian gummy shark (Mustelus antarcticus GÜnther) inferred from allozymes, mitochondrial DNA and vertebrae counts
,
1998
.
[35]
C. Borchiellini,et al.
Phylogenetic analysis of the Hsp70 sequences reveals the monophyly of Metazoa and specific phylogenetic relationships between animals and fungi.
,
1998,
Molecular biology and evolution.
[36]
W. Müller,et al.
Evolutionary relationships of Metazoa within the eukaryotes based on molecular data from Porifera
,
1999,
Proceedings of the Royal Society of London. Series B: Biological Sciences.
[37]
Hidetoshi Shimodaira,et al.
Multiple Comparisons of Log-Likelihoods with Applications to Phylogenetic Inference
,
1999,
Molecular Biology and Evolution.
[38]
H. Philippe,et al.
Critical Analysis of Eukaryotic Phylogeny: A Case Study Based on the HSP70 Family
,
1999,
The Journal of eukaryotic microbiology.
[39]
Sudhir Kumar,et al.
Divergence time estimates for the early history of animal phyla and the origin of plants, animals and fungi
,
1999,
Proceedings of the Royal Society of London. Series B: Biological Sciences.
[40]
Horizontal transfers confuse the prokaryotic phylogeny based on the HSP70 protein family
,
1999,
Molecular microbiology.
[41]
T. Mukhtar,et al.
Evolutionary relationships among photosynthetic prokaryotes (Heliobacterium chlorum, Chloroflexus aurantiacus, cyanobacteria, Chlorobium tepidum and proteobacteria): implications regarding the origin of photosynthesis
,
1999,
Molecular microbiology.
[42]
I. Sulaiman,et al.
Phylogenetic Relationships ofCryptosporidium Parasites Based on the 70-Kilodalton Heat Shock Protein (HSP70) Gene
,
2000,
Applied and Environmental Microbiology.
[43]
Phylogenetic analysis with newly characterized Babesia bovis hsp70 and hsp90 provides strong support for paraphyly within the piroplasms.
,
2000,
Molecular and biochemical parasitology.
[44]
S. Edwards,et al.
GENE DIVERGENCE , POPULATION DIVERGENCE , AND THE VARIANCE IN COALESCENCE TIME IN PHYLOGEOGRAPHIC STUDIES
,
2001
.
[45]
R. Page.
Extracting species trees from complex gene trees: reconciled trees and vertebrate phylogeny.
,
2000,
Molecular phylogenetics and evolution.
[46]
W. Doolittle,et al.
A kingdom-level phylogeny of eukaryotes based on combined protein data.
,
2000,
Science.
[47]
J. Wendel,et al.
Copy number lability and evolutionary dynamics of the Adh gene family in diploid and tetraploid cotton (Gossypium).
,
2000,
Genetics.
[48]
D. G. Brown,et al.
The origins of genomic duplications in Arabidopsis.
,
2000,
Science.
[49]
M. Lynch,et al.
The evolutionary fate and consequences of duplicate genes.
,
2000,
Science.
[50]
K. H. Wolfe.
Yesterday's polyploids and the mystery of diploidization
,
2001,
Nature Reviews Genetics.
[51]
D. Bhattacharya,et al.
Extensive Ribosomal DNA Genic Variation in the Columnar Cactus Lophocereus
,
2001,
Journal of Molecular Evolution.
[52]
John P. Huelsenbeck,et al.
MRBAYES: Bayesian inference of phylogenetic trees
,
2001,
Bioinform..
[53]
Y Van de Peer,et al.
Genome duplication, divergent resolution and speciation.
,
2001,
Trends in genetics : TIG.
[54]
J. Rosselló,et al.
Why nuclear ribosomal DNA spacers (ITS) tell different stories in Quercus.
,
2001,
Molecular phylogenetics and evolution.
[55]
Michael J. Stanhope,et al.
Universal trees based on large combined protein sequence data sets
,
2001,
Nature Genetics.
[56]
S. Hedges,et al.
Molecular Evidence for the Early Colonization of Land by Fungi and Plants
,
2001,
Science.
[57]
D. Hartl,et al.
Mitochondrial pseudogenes: evolution's misplaced witnesses.
,
2001,
Trends in ecology & evolution.
[58]
oseph,et al.
How Should Species Phylogenies Be Inferred from Sequence Data?
,
2001
.
[59]
F. van Roy,et al.
The human and murine protocadherin‐β one‐exon gene families show high evolutionary conservation, despite the difference in gene number
,
2001,
FEBS letters.
[60]
Avin,et al.
Amphioxus Mitochondrial DNA , Chordate Phylogeny , and the Limits of Inference Based on Comparisons of Sequences
,
2003
.
[61]
G. B. Golding,et al.
Evolution of HSP70 gene and its implications regarding relationships between archaebacteria, eubacteria, and eukaryotes
,
1993,
Journal of Molecular Evolution.