A conifer genomics resource of 200,000 spruce (Picea spp.) ESTs and 6,464 high-quality, sequence-finished full-length cDNAs for Sitka spruce (Picea sitchensis)

BackgroundMembers of the pine family (Pinaceae), especially species of spruce (Picea spp.) and pine (Pinus spp.), dominate many of the world's temperate and boreal forests. These conifer forests are of critical importance for global ecosystem stability and biodiversity. They also provide the majority of the world's wood and fiber supply and serve as a renewable resource for other industrial biomaterials. In contrast to angiosperms, functional and comparative genomics research on conifers, or other gymnosperms, is limited by the lack of a relevant reference genome sequence. Sequence-finished full-length (FL)cDNAs and large collections of expressed sequence tags (ESTs) are essential for gene discovery, functional genomics, and for future efforts of conifer genome annotation.ResultsAs part of a conifer genomics program to characterize defense against insects and adaptation to local environments, and to discover genes for the production of biomaterials, we developed 20 standard, normalized or full-length enriched cDNA libraries from Sitka spruce (P. sitchensis), white spruce (P. glauca), and interior spruce (P. glauca-engelmannii complex). We sequenced and analyzed 206,875 3'- or 5'-end ESTs from these libraries, and developed a resource of 6,464 high-quality sequence-finished FLcDNAs from Sitka spruce. Clustering and assembly of 147,146 3'-end ESTs resulted in 19,941 contigs and 26,804 singletons, representing 46,745 putative unique transcripts (PUTs). The 6,464 FLcDNAs were all obtained from a single Sitka spruce genotype and represent 5,718 PUTs.ConclusionThis paper provides detailed annotation and quality assessment of a large EST and FLcDNA resource for spruce. The 6,464 Sitka spruce FLcDNAs represent the third largest sequence-verified FLcDNA resource for any plant species, behind only rice (Oryza sativa) and Arabidopsis (Arabidopsis thaliana), and the only substantial FLcDNA resource for a gymnosperm. Our emphasis on capturing FLcDNAs and ESTs from cDNA libraries representing herbivore-, wound- or elicitor-treated induced spruce tissues, along with incorporating normalization to capture rare transcripts, resulted in a rich resource for functional genomics and proteomics studies. Sequence comparisons against five plant genomes and the non-redundant GenBank protein database revealed that a substantial number of spruce transcripts have no obvious similarity to known angiosperm gene sequences. Opportunities for future applications of the sequence and clone resources for comparative and functional genomics are discussed.

[1]  J. Bohlmann,et al.  Ethylene in induced conifer defense: cDNA cloning, protein expression, and cellular and subcellular localization of 1-aminocyclopropane-1-carboxylate oxidase in resin duct and phenolic parenchyma cells , 2006, Planta.

[2]  R. Sederoff,et al.  Analysis of xylem formation in pine by cDNA sequencing. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[3]  M. Gribskov,et al.  The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray) , 2006, Science.

[4]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[5]  J. Bennetzen,et al.  The Physcomitrella Genome Reveals Evolutionary Insights into the Conquest of Land by Plants , 2008, Science.

[6]  D. E. Ellis,et al.  Proteome analysis of early somatic embryogenesis in Picea glauca , 2005, Proteomics.

[7]  J. Bohlmann,et al.  Terpenoid biomaterials. , 2008, The Plant journal : for cell and molecular biology.

[8]  M. Soares,et al.  Construction and characterization of a normalized cDNA library. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[9]  J. Bohlmann,et al.  Functional plasticity of paralogous diterpene synthases involved in conifer defense , 2008, Proceedings of the National Academy of Sciences.

[10]  J. Bohlmann,et al.  Aminocyclopropane Carboxylic Acid Synthase Is a Regulated Step in Ethylene-Dependent Induced Conifer Defense. Full-Length cDNA Cloning of a Multigene Family, Differential Constitutive, and Wound- and Insect-Induced Expression, and Cellular and Subcellular Localization in Spruce and Douglas Fir12[W][ , 2006, Plant Physiology.

[11]  Diane M. Martin,et al.  Functional Characterization of Nine Norway Spruce TPS Genes and Evolution of Gymnosperm Terpene Synthases of the TPS-d Subfamily1[w] , 2004, Plant Physiology.

[12]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[13]  Richard W McCombie,et al.  Expressed sequence tag analysis in Cycas, the most primitive living seed plant , 2003, Genome Biology.

[14]  J. Poulain,et al.  The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla , 2007, Nature.

[15]  Piero Carninci,et al.  High-efficiency full-length cDNA cloning by biotinylated CAP trapper. , 1996, Genomics.

[16]  J. Bohlmann,et al.  Global monitoring of autumn gene expression within and among phenotypically divergent populations of Sitka spruce (Picea sitchensis). , 2008, The New phytologist.

[17]  J. Bohlmann,et al.  Dirigent proteins in conifer defense II: Extended gene discovery, phylogeny, and constitutive and stress-induced gene expression in spruce (Picea spp.). , 2007, Phytochemistry.

[18]  J. Heslop-Harrison,et al.  Diversity, origin, and distribution of retrotransposons (gypsy and copia) in conifers. , 2001, Molecular biology and evolution.

[19]  J. Bohlmann,et al.  Characterization of EST-SSRs in loblolly pine and spruce , 2007, Tree Genetics & Genomes.

[20]  A. Oliphant,et al.  A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). , 2002, Science.

[21]  J. Bohlmann,et al.  Insect-Induced Conifer Defense. White Pine Weevil and Methyl Jasmonate Induce Traumatic Resinosis, de Novo Formed Volatile Emissions, and Accumulation of Terpenoid Synthase and Putative Octadecanoid Pathway Transcripts in Sitka Spruce1[w] , 2005, Plant Physiology.

[22]  P. Green,et al.  Consed: a graphical tool for sequence finishing. , 1998, Genome research.

[23]  Meimei Xu,et al.  Following evolution's lead to a single residue switch for diterpene synthase product outcome , 2007, Proceedings of the National Academy of Sciences.

[24]  Steven J. M. Jones,et al.  Analysis of 4,664 high-quality sequence-finished poplar full-length cDNA clones and their utility for the discovery of genes responding to insect feeding , 2008, BMC Genomics.

[25]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[26]  J. Kawai,et al.  Collection, Mapping, and Annotation of Over 28,000 cDNA Clones from japonica Rice , 2003, Science.

[27]  C. dePamphilis,et al.  Phylogeny of seed plants based on all three genomic compartments: extant gymnosperms are monophyletic and Gnetales' closest relatives are conifers. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[28]  J. Craig Venter,et al.  Rapid cDNA sequencing (expressed sequence tags) from a directionally cloned human infant brain cDNA library , 1993, Nature Genetics.

[29]  R. Sederoff,et al.  Apparent homology of expressed genes from wood-forming tissues of loblolly pine (Pinus taeda L.) with Arabidopsis thaliana , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[30]  A. Schaller Induced plant resistance to herbivory , 2008 .

[31]  J. Bohlmann,et al.  Cytochrome P450 mono-oxygenases in conifer genomes: discovery of members of the terpenoid oxygenase superfamily in spruce and pine. , 2006, Biochemical Society transactions.

[32]  Piero Carninci,et al.  High-efficiency full-length cDNA cloning. , 1999, Methods in enzymology.

[33]  Ncbi National Center for Biotechnology Information , 2008 .

[34]  O. Junttila,et al.  Analysis of gene expression during bud burst initiation in Norway spruce via ESTs from subtracted cDNA libraries , 2006, Tree Genetics & Genomes.

[35]  J. Bohlmann Insect-Induced Terpenoid Defenses in Spruce , 2008 .

[36]  Huanming Yang,et al.  A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. indica) , 2002, Science.

[37]  Stephen M. Mount,et al.  Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. , 2003, Nucleic acids research.

[38]  N. Alexandrov,et al.  Features of Arabidopsis Genes and Genome Discovered using Full-length cDNAs , 2005, Plant Molecular Biology.

[39]  J. Weissenbach,et al.  Whole genome sequence comparisons and "full-length" cDNA sequences: a combined approach to evaluate and improve Arabidopsis genome annotation. , 2004, Genome research.

[40]  D. Ro,et al.  Loblolly pine abietadienol/abietadienal oxidase PtAO (CYP720B1) is a multifunctional, multisubstrate cytochrome P450 monooxygenase , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[41]  J. Bohlmann,et al.  Conifer defense against insects: Proteome analysis of Sitka spruce (Picea sitchensis) bark induced by mechanical wounding or feeding by white pine weevils (Pissodes strobi) , 2007, Proteomics.

[42]  J. Cairney,et al.  Expressed Sequence Tags from loblolly pine embryos reveal similarities with angiosperm embryogenesis , 2006, Plant Molecular Biology.

[43]  P. Saranpää,et al.  Expression profiling of the lignin biosynthetic pathway in Norway spruce using EST sequencing and real-time RT-PCR , 2007, Plant Molecular Biology.

[44]  K. Akiyama,et al.  Functional Annotation of a Full-Length Arabidopsis cDNA Collection , 2002, Science.

[45]  K. Ritland,et al.  Microarray gene expression profiling of developmental transitions in Sitka spruce (Picea sitchensis) apical shoots. , 2007, Journal of experimental botany.

[46]  M. Soares,et al.  Normalization and subtraction: two approaches to facilitate gene discovery. , 1996, Genome research.

[47]  H. Yoshimaru,et al.  Expression analysis of ESTs derived from the inner bark of Cryptomeria japonica , 2000, Plant Molecular Biology.

[48]  J. Bohlmann,et al.  Genes, enzymes and chemicals of terpenoid diversity in the constitutive and induced defence of conifers against insects and pathogens. , 2006, The New phytologist.

[49]  Carl J. Douglas,et al.  Robust simple sequence repeat markers for spruce (Picea spp.) from expressed sequence tags , 2004, Theoretical and Applied Genetics.

[50]  Piero Carninci,et al.  Normalization and subtraction of cap-trapper-selected cDNAs to prepare full-length cDNA libraries for rapid discovery of new genes. , 2000, Genome research.

[51]  J. Bohlmann,et al.  A NEW DIRECTION FOR CONIFER GENOMICS , 2006 .

[52]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[53]  M. Marra,et al.  Conifer defence against insects: microarray gene expression profiling of Sitka spruce (Picea sitchensis) induced by mechanical wounding or feeding by spruce budworms (Choristoneura occidentalis) or white pine weevils (Pissodes strobi) reveals large-scale changes of the host transcriptome. , 2006, Plant, cell & environment.

[54]  J. Bohlmann,et al.  Diterpene resin acids in conifers. , 2006, Phytochemistry.

[55]  James E. Johnson,et al.  Generation, annotation, analysis and database integration of 16,500 white spruce EST clusters , 2005, BMC Genomics.

[56]  J. Bohlmann,et al.  Dirigent Proteins in Conifer Defense: Gene Discovery, Phylogeny, and Differential Wound- and Insect-induced Expression of a Family of DIR and DIR-like Genes in Spruce (Picea spp.) , 2005, Plant Molecular Biology.

[57]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[58]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[59]  Walter N. Moss,et al.  EST analysis in Ginkgo biloba: an assessment of conserved developmental regulators and gymnosperm specific genes , 2005, BMC Genomics.

[60]  R. Croteau,et al.  Random sequencing of an induced Taxus cell cDNA library for identification of clones involved in Taxol biosynthesis , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[61]  M. Cervera,et al.  2.1 Introduction Genome Mapping and Molecular Breeding in Plants, Volume 7 , 2007 .

[62]  H. Kanamori,et al.  Comparative Analysis of Expressed Sequence Tags of Conifers and Angiosperms Reveals Sequences Specifically Conserved in Conifers , 2005, Plant Molecular Biology.

[63]  J. Dean,et al.  Water stress-responsive genes in loblolly pine (Pinus taeda) roots identified by analyses of expressed sequence tag libraries. , 2006, Tree physiology.