rbcL and Legume Phylogeny, with Particular Reference to Phaseoleae, Millettieae, and Allies

A parsimony analysis was conducted on 319 rbcL sequences, comprising 242 from 194 genera of Leguminosae and 77 from other families. Results support earlier conclusions from rbcL and other molecular data that a monophyletic Leguminosae is part of a Fabales that includes Polygalaceae, Surianaceae, and the anomalous rosid genus Quillaja. Within legumes, results of previous analyses were also supported, such as the paraphyletic nature of Caesalpinioideae and monophyly of Mimosoideae and Papilionoideae. Most new data (74 sequences) were from Papilionoideae, particularly Phaseoleae, Millettieae, and allies. Although the overall topology for Papilionoideae was largely unresolved, several large clades were well-supported. The analysis contained a large sample of Phaseoleae and Millettieae, and not surprisingly showed both tribes to be polyphyletic, though with all taxa except Wisteria and allied Millettieae belonging to a single well supported clade. Within this clade was a strongly supported group that included Phaseoleae subtribes Erythrininae, Glycininae, Phaseolinae, Kennediinae, and Cajaninae, with only the last two being monophyletic. Desmodieae and Psoraleeae were also part of this clade. The monophyletic Phaseoleae subtribes Ophrestiinae and Diocleinae grouped with most Millettieae in a clade that included a group similar to the core Millettieae identified in other studies. All but one of the remaining Millettieae sampled formed an additional clade within the overall rnillettioid/phaseoloid group. Of the various genes used for plant molecular phylogenetic relationships are unknown to be hysystematic analyses at higher taxonomic levels, rbcL pothesized simply and quickly. has been by far the most widely used, particularly The rbcL gene has played a role in the evolving for comprehensive analyses of angiosperms, understandkg of legume phylogeny. The earliest whether alone (e.g., Chase et al. 1993; Kallersjo et comprehensive cladistic analyses of legume phylogal. 1998) or with other genes (e.g., Qiu et al. 1999; eny with broad sampling were those of Chappill Soltis et al. 1999). Although several limitations of (19951, using a wide array of characters, and-rbc~ rbcL for angiosperm phylogeny reconstruction have studies by two groups (Doyle 1995; Kass and Wink been known since the earliest studies (e.g., Chase 1995). Both groups subsequently expanded these et al. 1993), the gene continues to be used in part studies (Kass and Wink 1996, 1997a, 199%; Doyle because comparable sampling of a readily alignable et al. 1997). Results from these studies were largely sequence does not exist elsewhere. The availability concordant with earlier molecular work, confirming of thousands of rbcL sequences in public databases for example the monophyly of groups with struc(over 8,000 as of late 20001, representing all major tural mutations of the chloroplast genome (e.g., groups of plants, allows the affinities of taxa whose Lavin et al. 1990; Doyle et al. 1996), and with long516 SYSTEMATIC BOTANY [Volume 26 standing views concerning the monophyly (or lack thereof) of the three subfamilies. Major groups in rbcL topologies were in many cases unresolved or weakly supported, particularly near the base of the tree in the paraphyletic Caesalpinioideae (Doyle et al. 1997). However, several large clades were identified within Papilionoideae, some of which were previously unknown, and several of which were well-supported. Phylogenetic analyses of the combined sequences from the two 1997 studies have not been published, and numerous new legume rbcL sequences have been generated since then. Moreover, none of the legume rbcL phylogenies included many outgroups. The sister group relationships of legumes are controversial, with molecular results in conflict with traditional views. The availability of a large number of rbcL sequences both from legumes and from putatively related taxa makes it possible to study the effect of extensive legume sampling on outgroup relationships and of outgroup sampling on topologies within. With recent improvements in computer hardware and software, as well as in search strategies, it is now possible to perform more thorough parsimony searches of tree space for large data sets (e.g., Nixon 1999). The goal of this paper is to conduct such a parsimony analysis on the large number of available legume rbcL sequences and numerous outgroups. Taxon Sampling. The sample of approximately 250 Leguminosae sequences publicly available at the commencement of this project was biased toward some groups, particularly the papilionoid tribe Genisteae, which had been the focus df studies by the Wink laboratory (e.g., Kass and Wink 1997a, b). There was some overlap in genera and in some cases even species sampled between our group (Doyle et al. 1997) and the Wink group (Kass and Wink 199%). Initial parsimony analyses were conducted in order to develop a data-set that minimized redundancy and excessive sampling of genera such as Luainus L. (Kass and Wink 1997a). Relatively few oi our many Desmodieae sequences were included here, because relationships in this tribe will be discussed elsewhere. No more than two sequences were retained for any genus whose sequences were monophyletic in such analyses; for genera whose sequences did not form monophyletic groups (e.g., Sophora L.), all sequences were used. For species having multiple representative sequences but which did not belong to multiply-sampled genera, all sequences were used unless they were identical. The resulting data set of 242 legume sequences represented 194 genera (Table 1)and included 74 new sequences. Emphasis was on Papilionoideae, with sequences from 164 of the 451 genera and all 30 of the tribes recognized by Polhill (1994), whose classification is used throughout this section. For Caesalpinioideae, 24 of 151 genera were included, representing all four tribes (Caesalpinieae, Cassieae, Cercideae, Detarieae). This included five of the nine informal "groups" of Caesalpinieae, four of the five subtribes of Cassieae, and both subtribes of Cercideae. However, only five genera were included from the large (81 genera) Detarieae, representing four of the 10 informal "groups." Outside of Detarieae, sampling deficiencies were due mostly to the difficulty in obtaining usable material. For example, numerous attempts to obtain sequences from collections of Duparquetia Baill. (Cassieae: Duparquetiinae) and Poeppigia Presl (Caesalpinieae: Poeppigia group) were unsuccessful. Sampling was lowest for Mimosoideae, with only six genera represented. However, this subfamily has been assumed to be monophyletic. Seventy-seven sequences from families other than Leguminosae were also included (Table 1). These were chosen to represent: 1) families shown by previous comprehensive rbcL analyses (e.g., Chase et al. 1993; Soltis et al. 1995; Kallersjo et al. 1998) to belong to clades near legumes; 2) families hypothesized to be near legumes on the basis of morphology, chemistry, and other non-molecular data (Dickison 1981; Thorne 1992); and 3) families identified as close to legumes by the molecular, nonmolecular, or combined analyses of Nandi et al. (1998).Asarum (Aristolochiaceae) was included as the outgroup to this assembly of largely "rosid" taxa. One new sequence was added, from Byrsocarpus coccinea, as a check on the position of Connaraceae, a key family from which only a single sequence (from Connarus conchocarpus) was publicly available. Phylogenetic Analysis. The first 1,434 bases of the rbcL gene were aligned in Winclada (Nixon 1999b); the first 30 positions, corresponding to the forward amplification primer, were not used in analyses. Approximately 2% of 530 parsimony-informative sites were missing in the data set, primarily at the extreme 3' or 5' ends of sequences. Among legume taxa, only partial sequences were 20011 KAJITA ET AL.: rbcL AND LEGUME PHYLOGENY 517 available for Dialium (335 of 530 informative positions), Hymenaea protera (122/530), Hymenolobium excelsum (235/530), Fordia caulifora (269/530), and Strongylodon macrobotrys (316/530). The data matrix is available at TreeBASE (http://www.herbaria. harvard.edu/treebase/) as study accession number S578. Parsimony analyses were conducted using NONA (Goloboff 1994), with nucleotide characters treated as unordered and equally weighted. Searches were conducted using the "parsimony ratchet" strategy, which has been shown to be very effective with data sets in excess of 500 terminals (Nixon 1999a), sampling tree space more efficiently than conventional methods (e.g., many iterations of random taxon additions optimizing all characters using equal weights). A typical ratchet analysis begins with a conventional starting tree from randomly ordered taxa (a single random addition sequence) and then initiates an iterative analysis consisting of the following steps: 1)perturbation of the matrix by increasing the weights of, or eliminating, a random small subset of characters; 2) branch swapping to obtain one representative shortest tree; 3) resetting weights to original values; 4) branch swapping with equal weights using the perturbed tree as the starting tree. The cycle is repeated by starting with the tree that resulted from the previous iteration and perturbing the data to start step one over again. A large number of iterations are conducted in a single ratchet analysis, with all equally parsimonious trees being retained. The efficiency of this method is attributed to the fact that shortest trees found with perturbed characters are not most parsimonious solutions, but are close enough that they serve as excellent starting trees for unperturbed analyses. The starting tree and weighting scheme also quickly jumps between tree islands. The use of such trees is a major improvement over conventional random addition trees, which are far from parsimonious and require considerable searching to achieve near-optimality (Nixon 1999a). Ratchets were impleme

[1]  R. Thorne,et al.  Classification and geography of the flowering plants , 1992, The Botanical Review.

[2]  B. Wyk,et al.  Evolutionary relationships in thePodalyrieae andLiparieae (Fabaceae) based on morphological, cytological, and chemical evidence , 1998, Plant Systematics and Evolution.

[3]  M. Wink,et al.  Molecular phylogeny and phylogeography ofLupinus (Leguminosae) inferred from nucleotide sequences of therbcL gene and ITS 1 + 2 regions of rDNA , 1997, Plant Systematics and Evolution.

[4]  R. Pennington,et al.  Phylogenetic Relationships of Basal Papilionoid Legumes Based Upon Sequences of the Chloroplast trnL Intron , 2009 .

[5]  R. Pennington,et al.  The dalbergioid legumes (Fabaceae): delimitation of a pantropical monophyletic clade. , 2001, American journal of botany.

[6]  M. Sanderson,et al.  Phylogenetic systematics of the tribe Millettieae (Leguminosae) based on chloroplast trnK/matK sequences and its implications for evolutionary patterns in Papilionoideae. , 2000, American journal of botany.

[7]  K. Nixon,et al.  The Parsimony Ratchet, a New Method for Rapid Parsimony Analysis , 1999, Cladistics : the international journal of the Willi Hennig Society.

[8]  Mark W. Chase,et al.  The earliest angiosperms: evidence from mitochondrial, plastid and nuclear genomes , 1999, Nature.

[9]  Jerrold I. Davis,et al.  Data decisiveness, data quality, and incongruence in phylogenetic analysis: an example from the monocotyledons using mitochondrial atp A sequences. , 1998, Systematic biology.

[10]  S. Mathews,et al.  Monophyletic subgroups of the tribe Millettieae (Leguminosae) as revealed by phytochrome nucleotide sequence data , 1998 .

[11]  M. Chase,et al.  A COMBINED CLADISTIC ANALYSIS OF ANGIOSPERMS USING RBCL AND NON-MOLECULAR DATA SETS , 1998 .

[12]  J. Doyle,et al.  A phylogeny of the chloroplast gene rbcL in the Leguminosae: taxonomic correlations and insights into the evolution of nodulation. , 1997, American journal of botany.

[13]  W. John Kress,et al.  Angiosperm Phylogeny Inferred from 18S Ribosomal DNA Sequences , 1997 .

[14]  S. Tucker Trends in evolution of floral ontogeny in Cassia sensu stricto, Senna, and Chamaecrista(Leguminosae: Caesalpinioideae: Cassieae: Cassiinae); a study in convergence , 1996 .

[15]  J. Farris,et al.  PARSIMONY JACKKNIFING OUTPERFORMS NEIGHBOR‐JOINING , 1996, Cladistics : the international journal of the Willi Hennig Society.

[16]  J. Palmer,et al.  Multiple Independent Losses of Two Genes and One Intron from Legume Chloroplast Genomes , 1995 .

[17]  D. Soltis,et al.  Chloroplast gene sequence data suggest a single origin of the predisposition for symbiotic nitrogen fixation in angiosperms. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[18]  F. Breteler The boundary between Amherstieae and Detarieae (Caesalpinioideae). , 1995 .

[19]  J. Doyle,et al.  Chloroplast DNA Phylogeny of the Papilionoid Legume Tribe Phaseoleae , 1993 .

[20]  James F. Smith Phylogenetics of seed plants : An analysis of nucleotide sequences from the plastid gene rbcL , 1993 .

[21]  M. Mizuno,et al.  On the Genus Euchresta Benn. (Leguminosae) with “Wallace's Line” , 1992 .

[22]  J. Palmer,et al.  EVOLUTIONARY SIGNIFICANCE OF THE LOSS OF THE CHLOROPLAST‐DNA INVERTED REPEAT IN THE LEGUMINOSAE SUBFAMILY PAPILIONOIDEAE , 1990, Evolution; international journal of organic evolution.

[23]  J. Sprent,et al.  Occurrence of nodulation in the Leguminosae. , 1989, The New phytologist.

[24]  W. R. Anderson,et al.  An Integrated System of Classification of Flowering Plants , 1982 .