Identification of shared single copy nuclear genes in Arabidopsis, Populus, Vitis and Oryza and their phylogenetic utility across various taxonomic levels

BackgroundAlthough the overwhelming majority of genes found in angiosperms are members of gene families, and both gene- and genome-duplication are pervasive forces in plant genomes, some genes are sufficiently distinct from all other genes in a genome that they can be operationally defined as 'single copy'. Using the gene clustering algorithm MCL-tribe, we have identified a set of 959 single copy genes that are shared single copy genes in the genomes of Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera and Oryza sativa. To characterize these genes, we have performed a number of analyses examining GO annotations, coding sequence length, number of exons, number of domains, presence in distant lineages, such as Selaginella and Physcomitrella, and phylogenetic analysis to estimate copy number in other seed plants and to demonstrate their phylogenetic utility. We then provide examples of how these genes may be used in phylogenetic analyses to reconstruct organismal history, both by using extant coverage in EST databases for seed plants and de novo amplification via RT-PCR in the family Brassicaceae.ResultsThere are 959 single copy nuclear genes shared in Arabidopsis, Populus, Vitis and Oryza ["APVO SSC genes"]. The majority of these genes are also present in the Selaginella and Physcomitrella genomes. Public EST sets for 197 species suggest that most of these genes are present across a diverse collection of seed plants, and appear to exist as single or very low copy genes, though exceptions are seen in recently polyploid taxa and in lineages where there is significant evidence for a shared large-scale duplication event. Genes encoding proteins localized in organelles are more commonly single copy than expected by chance, but the evolutionary forces responsible for this bias are unknown.Regardless of the evolutionary mechanisms responsible for the large number of shared single copy genes in diverse flowering plant lineages, these genes are valuable for phylogenetic and comparative analyses. Eighteen of the APVO SSC single copy genes were amplified in the Brassicaceae using RT-PCR and directly sequenced. Alignments of these sequences provide improved resolution of Brassicaceae phylogeny compared to recent studies using plastid and ITS sequences. An analysis of sequences from 13 APVO SSC genes from 69 species of seed plants, derived mainly from public EST databases, yielded a phylogeny that was largely congruent with prior hypotheses based on multiple plastid sequences. Whereas single gene phylogenies that rely on EST sequences have limited bootstrap support as the result of limited sequence information, concatenated alignments result in phylogenetic trees with strong bootstrap support for already established relationships. Overall, these single copy nuclear genes are promising markers for phylogenetics, and contain a greater proportion of phylogenetically-informative sites than commonly used protein-coding sequences from the plastid or mitochondrial genomes.ConclusionsPutatively orthologous, shared single copy nuclear genes provide a vast source of new evidence for plant phylogenetics, genome mapping, and other applications, as well as a substantial class of genes for which functional characterization is needed. Preliminary evidence indicates that many of the shared single copy nuclear genes identified in this study may be well suited as markers for addressing phylogenetic hypotheses at a variety of taxonomic levels.

[1]  Debashish Bhattacharya,et al.  Photosynthetic eukaryotes unite: endosymbiosis connects the dots. , 2004, BioEssays : news and reviews in molecular, cellular and developmental biology.

[2]  Jessica A Schlueter,et al.  Mining EST databases to resolve evolutionary events in major crop species. , 2004, Genome.

[3]  J. Leebens-Mack,et al.  Nuclear DNA‐based markers for plant evolutionary biology , 1997, Molecular ecology.

[4]  M. Hasebe,et al.  Phylogeny and divergence of basal angiosperms inferred from APETALA3- and PISTILLATA-like MADS-box genes , 2004, Journal of Plant Research.

[5]  D. Levin Polyploidy and Novelty in Flowering Plants , 1983, The American Naturalist.

[6]  J. Wendel,et al.  L. A. S. JOHNSON REVIEW No. 2 Use of nuclear genes for phylogeny reconstruction in plants , 2004 .

[7]  Steven Maere,et al.  Genome duplication and the origin of angiosperms. , 2005, Trends in ecology & evolution.

[8]  Richard Cronn,et al.  Evolutionary relationships among Pinus (Pinaceae) subsections inferred from multiple low-copy nuclear loci. , 2005, American journal of botany.

[9]  M. E. Mort,et al.  The continuing search: low-copy nuclear sequences for lower-level plant molecular phylogenetic studies , 2004 .

[10]  K. H. Wolfe,et al.  Not born equal: increased rate asymmetry in relocated and retrotransposed rodent gene duplicates. , 2006, Molecular biology and evolution.

[11]  M. Donoghue,et al.  Rates of Molecular Evolution Are Linked to Life History in Flowering Plants , 2008, Science.

[12]  Kevin P. Byrne,et al.  Rate asymmetry after genome duplication causes substantial long-branch attraction artifacts in the phylogeny of Saccharomyces species. , 2006, Molecular biology and evolution.

[13]  Alfried P Vogler,et al.  Dense taxonomic EST sampling and its applications for molecular systematics of the Coleoptera (beetles). , 2006, Molecular biology and evolution.

[14]  S. Yi,et al.  Correlated asymmetry of sequence and functional divergence between duplicate proteins of Saccharomyces cerevisiae. , 2006, Molecular biology and evolution.

[15]  T. Sang Utility of Low-Copy Nuclear Gene Sequences in Plant Phylogenetics , 2002, Critical reviews in biochemistry and molecular biology.

[16]  E. Kramer,et al.  Patterns of gene duplication and functional evolution during the diversification of the AGAMOUS subfamily of MADS box genes in angiosperms. , 2004, Genetics.

[17]  J. Poulain,et al.  The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla , 2007, Nature.

[18]  E. Kellogg,et al.  Phylogeny of Andropogoneae Inferred from Phytochrome B, GBSSI, and ndhF , 2002, International Journal of Plant Sciences.

[19]  Douglas E. Soltis,et al.  Molecular Systematics of Plants , 1992, Springer US.

[20]  Eugene W. Myers,et al.  Basic local alignment search tool. Journal of Molecular Biology , 1990 .

[21]  R. Veitia Gene dosage balance: deletions, duplications and dominance. , 2005, Trends in genetics : TIG.

[22]  S. Strauss,et al.  Diverseeffects of overexpressionof LEAFYand PTLF, a poplar (Populus) homolog of LEAFWFLORICAULA, .. , 2000 .

[23]  S. Otto,et al.  Polyploid incidence and evolution. , 2000, Annual review of genetics.

[24]  R. Olmstead,et al.  The pentatricopeptide repeat (PPR) gene family, a tremendous resource for plant phylogenetic studies. , 2009, The New phytologist.

[25]  C. Neinhuis,et al.  Angiosperm phylogeny based on matK sequence information. , 2003, American journal of botany.

[26]  Klaas Vandepoele,et al.  The hidden duplication past of Arabidopsis thaliana , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[27]  J. Birchler,et al.  Biological consequences of dosage dependent gene regulatory systems. , 2007, Biochimica et biophysica acta.

[28]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[29]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[30]  Pamela S Soltis,et al.  Using plastid genome-scale data to resolve enigmatic relationships among basal angiosperms , 2007, Proceedings of the National Academy of Sciences.

[31]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[32]  Eugene V Koonin,et al.  Duplicated genes evolve slower than singletons despite the initial rate increase , 2004, BMC Evolutionary Biology.

[33]  E. Kellogg,et al.  Brassicaceae phylogeny and trichome evolution. , 2006, American journal of botany.

[34]  A. Paterson,et al.  Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[35]  James Leebens-Mack,et al.  Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns , 2007, Proceedings of the National Academy of Sciences.

[36]  A. Force,et al.  Preservation of duplicate genes by complementary, degenerative mutations. , 1999, Genetics.

[37]  Guillaume Blanc,et al.  Widespread Paleopolyploidy in Model Plant Species Inferred from Age Distributions of Duplicate Genes , 2004, The Plant Cell Online.

[38]  R. Haselkorn,et al.  Phylogenetic analysis of the acetyl-CoA carboxylase and 3-phosphoglycerate kinase loci in wheat and other grasses , 2002, Plant Molecular Biology.

[39]  D. G. Brown,et al.  The origins of genomic duplications in Arabidopsis. , 2000, Science.

[40]  R. Eastwood,et al.  From famine to feast? Selecting nuclear DNA sequence loci for plant species-level phylogeny reconstruction , 2006, Philosophical Transactions of the Royal Society B: Biological Sciences.

[41]  B. Gravendeel,et al.  Potential phylogenetic utility of the nuclear FLORICAULA/LEAFY second intron: comparison with three chloroplast DNA regions in Amorphophallus (Araceae). , 2004, Molecular phylogenetics and evolution.

[42]  Kai F. Müller,et al.  PlantTribes: a gene and gene family resource for comparative genomics in plants , 2007, Nucleic Acids Res..

[43]  Charles James Nice Bailey,et al.  Toward a global phylogeny of the Brassicaceae. , 2006, Molecular biology and evolution.

[44]  M. Wada,et al.  Apple has two orthologues of FLORICAULA/LEAFY involved in flowering , 2002, Plant Molecular Biology.

[45]  K. Müller,et al.  PRAP-computation of Bremer support for large data sets. , 2004, Molecular phylogenetics and evolution.

[46]  J. Doyle,et al.  Phylogenetic utility of the nuclear gene malate synthase in the palm family (Arecaceae). , 2001, Molecular phylogenetics and evolution.

[47]  Stephen M. Mount,et al.  The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus) , 2008, Nature.

[48]  J. Doyle,et al.  Potential phylogenetic utility of the low-copy nuclear gene pistillata in dicotyledonous plants: comparison to nrDNA ITS and trnL intron in Sphaerocardamum and other Brassicaceae. , 1999, Molecular phylogenetics and evolution.

[49]  Andrew H Paterson,et al.  Buffering of crucial functions by paleologous duplicated genes may contribute cyclicality to angiosperm genome duplication. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[50]  E. Kellogg,et al.  Granule-bound starch synthase: structure, function, and phylogenetic utility. , 1998, Molecular biology and evolution.

[51]  J. Raes,et al.  Modeling gene and genome duplications in eukaryotes. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[52]  J. Wendel,et al.  Evolutionary dynamics of Waxy and the origin of hexaploid Spartina species (Poaceae). , 2007, Molecular phylogenetics and evolution.

[53]  Claude W. dePamphilis,et al.  A Genomics Approach to the Study of Ancient Polyploidy and Floral Developmental Genetics , 2006 .

[54]  Wei Zhu,et al.  The TIGR Plant Transcript Assemblies database , 2006, Nucleic Acids Res..

[55]  S. Tanksley,et al.  Combining Bioinformatics and Phylogenetics to Identify Large Sets of Single-Copy Orthologous Genes (COSII) for Comparative, Evolutionary and Systematic Studies: A Test Case in the Euasterid Plant Clade , 2006, Genetics.

[56]  D Weigel,et al.  Flowering-time genes modulate the response to LEAFY activity. , 1998, Genetics.

[57]  M. Donoghue,et al.  Basal Angiosperm Phylogeny Inferred from Duplicate Phytochromes A and C , 2000, International Journal of Plant Sciences.

[58]  W. Doolittle,et al.  A kingdom-level phylogeny of eukaryotes based on combined protein data. , 2000, Science.

[59]  Jocelyn C Hall,et al.  Phylogeny of Capparaceae and Brassicaceae based on chloroplast sequence data. , 2002, American journal of botany.

[60]  David Q. Matus,et al.  Broad phylogenomic sampling improves resolution of the animal tree of life , 2008, Nature.

[61]  David Posada,et al.  MODELTEST: testing the model of DNA substitution , 1998, Bioinform..

[62]  L. Lukens,et al.  Genome redundancy and plasticity within ancient and recent Brassica crop species , 2004 .

[63]  H. Ma,et al.  Isolation of cDNAs encoding guanine nucleotide-binding protein beta-subunit homologues from maize (ZGB1) and Arabidopsis (AGB1). , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[64]  Thomas Mitchell-Olds,et al.  Independent Ancient Polyploidy Events in the Sister Families Brassicaceae and Cleomaceae[W] , 2006, The Plant Cell Online.

[65]  W. Kress,et al.  Angiosperm phylogeny inferred from 18S rDNA, rbcL, and atpB sequences , 2000 .

[66]  D. Hartl,et al.  A portrait of copy-number polymorphism in Drosophila melanogaster , 2007, Proceedings of the National Academy of Sciences.

[67]  E. Zimmer,et al.  Generating single-copy nuclear gene data for a recent adaptive radiation. , 2006, Molecular phylogenetics and evolution.

[68]  J. Wendel,et al.  Feast and famine in plant genomes , 2002, Genetica.

[69]  J. Birchler,et al.  Dosage-dependent gene regulation in multicellular eukaryotes: implications for dosage compensation, aneuploid syndromes, and quantitative traits. , 2001, Developmental biology.

[70]  J. Leebens-Mack,et al.  To B or Not to B a flower: the role of DEFICIENS and GLOBOSA orthologs in the evolution of the angiosperms. , 2005, The Journal of heredity.

[71]  Vincent Colot,et al.  Understanding mechanisms of novel gene expression in polyploids. , 2003, Trends in genetics : TIG.

[72]  J. Doyle,et al.  Chloroplast-expressed glutamine synthetase (ncpGS): potential utility for phylogenetic studies with an example from Oxalis (Oxalidaceae). , 1999, Molecular phylogenetics and evolution.

[73]  M. Freeling,et al.  The evolutionary position of subfunctionalization, downgraded. , 2008, Genome dynamics.

[74]  M. Koch,et al.  Chromosome triplication found across the tribe Brassiceae. , 2005, Genome research.

[75]  Derrick J. Zwickl Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion , 2006 .

[76]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[77]  H Philippe,et al.  Molecular phylogeny: pitfalls and progress. , 2000, International microbiology : the official journal of the Spanish Society for Microbiology.

[78]  J. Chris Pires,et al.  Gene and genome duplications: the impact of dosage-sensitivity on the fate of nuclear genes , 2009, Chromosome Research.

[79]  Dawei Li,et al.  The Genomes of Oryza sativa: A History of Duplications , 2005, PLoS biology.

[80]  Y. van de Peer,et al.  Detecting the undetectable: uncovering duplicated segments in Arabidopsis by comparison with rice. , 2002, Trends in genetics : TIG.

[81]  Yasuko Takahashi,et al.  Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events , 2022 .

[82]  R. Van der Hoeven,et al.  Identification, Analysis, and Utilization of Conserved Ortholog Set Markers for Comparative Genomics in Higher Plants Article, publication date, and citation information can be found at www.plantcell.org/cgi/doi/10.1105/tpc.010479. , 2002, The Plant Cell Online.

[83]  P. Lu,et al.  Rapid genome change in synthetic polyploids of Brassica and its implications for polyploid evolution. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[84]  T. Borsch,et al.  Molecular evolution and phylogenetic utility of the petD group II intron: a case study in basal angiosperms. , 2005, Molecular biology and evolution.

[85]  J. Doebley,et al.  Duplicate FLORICAULA/LEAFY homologs zfl1 and zfl2 control inflorescence architecture and flower patterning in maize , 2003, Development.

[86]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[87]  D. Potter,et al.  Phylogenetic utility of the second intron of LEAFY in Neillia and Stephanandra (Rosaceae) and implications for the origin of Stephanandra. , 2003, Molecular phylogenetics and evolution.

[88]  S. Bottani,et al.  Cellular reactions to gene dosage imbalance: genomic, transcriptomic and proteomic effects. , 2008, Trends in genetics : TIG.

[89]  Walter Gilbert,et al.  The evolution of spliceosomal introns: patterns, puzzles and progress , 2006, Nature Reviews Genetics.

[90]  E. Meyerowitz,et al.  Molecular cloning and characterization of GPA1, a G protein alpha subunit gene from Arabidopsis thaliana. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[91]  D. Soltis,et al.  Phylogeny and domain evolution in the APETALA2-like gene family. , 2006, Molecular biology and evolution.

[92]  G. Segal,et al.  Rapid elimination of low-copy DNA sequences in polyploid wheat: a possible mechanism for differentiation of homoeologous chromosomes. , 1997, Genetics.

[93]  Jonathan F. Wendel,et al.  Genes duplicated by polyploidy show unequal contributions to the transcriptome and organ-specific reciprocal silencing , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[94]  S. Osawa,et al.  Evolutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated genes. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[95]  Jonathan F. Wendel,et al.  Phylogenetic Incongruence: Window into Genome History and Molecular Evolution , 1998 .

[96]  M. Gribskov,et al.  The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray) , 2006, Science.

[97]  Anton J. Enright,et al.  Protein families and TRIBES in genome sequence space. , 2003, Nucleic acids research.

[98]  S. Baldauf,et al.  The Deep Roots of Eukaryotes , 2003, Science.

[99]  R. Veitia,et al.  The Gene Balance Hypothesis: From Classical Genetics to Modern Genomics , 2007, The Plant Cell Online.

[100]  E. Kellogg,et al.  Systematics and phylogeny of the Brassicaceae (Cruciferae): an overview , 2006, Plant Systematics and Evolution.

[101]  M. Donoghue,et al.  The root of angiosperm phylogeny inferred from duplicate phytochrome genes. , 1999, Science.