Decoding the oak genome: public release of sequence data, assembly, annotation and publication strategies

The 1.5 Gbp/2C genome of pedunculate oak (Quercus robur) has been sequenced. A strategy was established for dealing with the challenges imposed by the sequencing of such a large, complex and highly heterozygous genome by a whole‐genome shotgun (WGS) approach, without the use of costly and time‐consuming methods, such as fosmid or BAC clone‐based hierarchical sequencing methods. The sequencing strategy combined short and long reads. Over 49 million reads provided by Roche 454 GS‐FLX technology were assembled into contigs and combined with shorter Illumina sequence reads from paired‐end and mate‐pair libraries of different insert sizes, to build scaffolds. Errors were corrected and gaps filled with Illumina paired‐end reads and contaminants detected, resulting in a total of 17 910 scaffolds (>2 kb) corresponding to 1.34 Gb. Fifty per cent of the assembly was accounted for by 1468 scaffolds (N50 of 260 kb). Initial comparison with the phylogenetically related Prunus persica gene model indicated that genes for 84.6% of the proteins present in peach (mean protein coverage of 90.5%) were present in our assembly. The second and third steps in this project are genome annotation and the assignment of scaffolds to the oak genetic linkage map. In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement, the oak genome data have been released into public sequence repositories in advance of publication. In this presubmission paper, the oak genome consortium describes its principal lines of work and future directions for analyses of the nature, function and evolution of the oak genome.

[1]  R. Finkeldey,et al.  Genetic variation and differentiation within a natural community of five oak species (Quercus spp.). , 2007, Plant biology.

[2]  M. Martin-Magniette,et al.  Analysis of BAC end sequences in oak, a keystone forest tree species, providing insight into the composition of its genome , 2011, BMC Genomics.

[3]  E. Dreyer,et al.  Quantitative trait loci of tolerance to waterlogging in a European oak (Quercus robur L.): physiological relevance and temporal effect patterns. , 2007, Plant, cell & environment.

[4]  Guangrui Huang,et al.  HaploMerger: Reconstructing allelic relationships for polymorphic diploid genome assemblies , 2012, Genome research.

[5]  Walter Pirovano,et al.  SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information , 2014, BMC Bioinformatics.

[6]  M. Nei Molecular Evolutionary Genetics , 1987 .

[7]  R. Finkeldey,et al.  Quantitative trait loci affecting stomatal density and growth in a Quercus robur progeny: implications for the adaptation to changing environments , 2008 .

[8]  Hadi Quesneville,et al.  GnpIS: an information system to integrate genetic and genomic data from plants and fungi , 2013, Database J. Biol. Databases Curation.

[9]  D G Brown,et al.  Selective mapping: a strategy for optimizing the construction of high-density linkage maps. , 2000, Genetics.

[10]  Casey M. Bergman,et al.  Combined Evidence Annotation of Transposable Elements in Genome Sequences , 2005, PLoS Comput. Biol..

[11]  Gregory Kucherov,et al.  YASS: enhancing the sensitivity of DNA similarity search , 2005, Nucleic Acids Res..

[12]  D. Gautheret,et al.  RNAspace.org: An integrated environment for the prediction, annotation, and analysis of ncRNA. , 2011, RNA.

[13]  Tetsuya Hayashi,et al.  Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads , 2014, Genome research.

[14]  H. Quesneville,et al.  PASTEC: An Automatic Transposable Element Classification Tool , 2014, PloS one.

[15]  Sean R. Eddy,et al.  Infernal 1.1: 100-fold faster RNA homology searches , 2013, Bioinform..

[16]  Inanç Birol,et al.  Assembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data , 2013, Bioinform..

[17]  F. Alberto,et al.  Species relative abundance and direction of introgression in oaks , 2009, Molecular ecology.

[18]  L. Stein,et al.  JBrowse: a next-generation genome browser. , 2009, Genome research.

[19]  M. Gribskov,et al.  The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray) , 2006, Science.

[20]  Dawei Li,et al.  The sequence and de novo assembly of the giant panda genome , 2010, Nature.

[21]  Shakhnoza S. Azimova,et al.  Cucumis sativus L. , 2012 .

[22]  R. Petit,et al.  Molecular Approaches in Natural Resource Conservation and Management: Historical and contemporary dynamics of adaptive differentiation in European oaks , 2010 .

[23]  P. Schmitt‐Kopplin,et al.  Integrated transcriptomics and metabolomics decipher differences in the resistance of pedunculate oak to the herbivore Tortrix viridana L. , 2013, BMC Genomics.

[24]  F. Alberto,et al.  Adaptive responses for seed and leaf phenology in natural populations of sessile oak along an altitudinal gradient , 2011, Journal of evolutionary biology.

[25]  F. Martin,et al.  Transcriptional changes in two types of pre-mycorrhizal roots and in ectomycorrhizas of oak microcuttings inoculated with Piloderma croceum , 2006, Planta.

[26]  D. Pot,et al.  Distribution of genomic regions differentiating oak species assessed by QTL detection , 2004, Heredity.

[27]  H. Quesneville,et al.  The oak gene expression atlas: insights into Fagaceae genome evolution and the discovery of genes regulated during bud dormancy release , 2015, BMC Genomics.

[28]  A. de Daruvar,et al.  A fast and cost-effective approach to develop and map EST-SSR markers: oak as a case study , 2010, BMC Genomics.

[29]  N. Camp,et al.  Quantitative Trait Loci , 2002 .

[30]  Peter Schattner,et al.  The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs , 2005, Nucleic Acids Res..

[31]  Jian Wang,et al.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler , 2012, GigaScience.

[32]  Margaret E. Staton,et al.  Genomics of Fagaceae , 2012, Tree Genetics & Genomes.

[33]  Asan,et al.  The genome of the cucumber, Cucumis sativus L. , 2009, Nature Genetics.

[34]  Lon R. Cardon,et al.  Quantitative Trait Loci , 1995 .

[35]  Shreeya Nanda,et al.  Unpublished genomic data–how to share? , 2014, BMC Genomics.

[36]  P. Wincker,et al.  Bioinformatic analysis of ESTs collected by Sanger and pyrosequencing methods for a keystone forest tree species: oak , 2010, BMC Genomics.

[37]  Naomi S. Altman,et al.  Comparison of next generation sequencing technologies for transcriptome characterization , 2009 .

[38]  Vincent Moulton,et al.  The UEA sRNA workbench: a suite of tools for analysing and visualizing next generation sequencing microRNA and small RNA datasets , 2012, Bioinform..

[39]  A. Boland,et al.  Single‐nucleotide polymorphism discovery and validation in high‐density SNP array for genetic analysis in European white oaks , 2015, Molecular Ecology Resources.

[40]  Rolf Apweiler,et al.  InterProScan: protein domains identifier , 2005, Nucleic Acids Res..

[41]  J. Galindo,et al.  Applications of next generation sequencing in molecular ecology of non-model organisms , 2011, Heredity.

[42]  C. Plomion,et al.  Role of waterlogging-responsive genes in shaping interspecific differentiation between two sympatric oak species. , 2012, Tree physiology.

[43]  Y. Vitasse,et al.  To what extent is altitudinal variation of functional traits driven by genetic adaptation in European oak and beech? , 2011, Tree physiology.

[44]  C. Plomion,et al.  Comparative mapping between Quercus and Castanea using simple-sequence repeats (SSRs) , 2004, Theoretical and Applied Genetics.

[45]  Douglas G. Scofield,et al.  The Norway spruce genome sequence and conifer genome evolution , 2013, Nature.

[46]  T. Flutre,et al.  Considering Transposable Element Diversification in De Novo Annotation Approaches , 2011, PloS one.

[47]  Jerzy Jurka,et al.  Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor , 2006, BMC Bioinformatics.

[48]  H. Quesneville,et al.  Wheat syntenome unveils new evidences of contrasted evolutionary plasticity between paleo- and neoduplicated subgenomes. , 2013, The Plant journal : for cell and molecular biology.

[49]  M. Lascoux,et al.  A sample view of the pedunculate oak (Quercus robur) genome from the sequencing of hypomethylated and random genomic libraries , 2011, Tree Genetics & Genomes.

[50]  J. Poulain,et al.  High quality draft sequences for prokaryotic genomes using a mix of new sequencing technologies , 2008, BMC Genomics.

[51]  C. Plomion,et al.  Quantitative trait loci mapping for vegetative propagation in pedunculate oak , 2005 .

[52]  Nicholas Stiffler,et al.  Population Genomics of Parallel Adaptation in Threespine Stickleback using Sequenced RAD Tags , 2010, PLoS genetics.

[53]  J. Carlson,et al.  Fagaceae trees as models to integrate ecology, evolution and genomics. , 2013, The New phytologist.

[54]  Lior Pachter,et al.  VISTA: computational tools for comparative genomics , 2004, Nucleic Acids Res..

[55]  H. Bohnert,et al.  Shedding light on an extremophile lifestyle through transcriptomics. , 2009, The New phytologist.

[56]  Yi Zhang,et al.  Comparison of the transcriptomes of American chestnut (Castanea dentata) and Chinese chestnut (Castanea mollissima) in response to the chestnut blight infection , 2009, BMC Plant Biology.

[57]  Antoine Kremer,et al.  Quantitative trait loci controlling water use efficiency and related traits in Quercus robur L. , 2008, Tree Genetics & Genomes.

[58]  J. Leplé,et al.  Transcriptional profiling of bud dormancy induction and release in oak by next-generation sequencing , 2013, BMC Genomics.

[59]  J. Salse In silico archeogenomics unveils modern plant genome organisation, regulation and evolution. , 2012, Current opinion in plant biology.

[60]  Richard D. Hayes,et al.  The genome of Eucalyptus grandis , 2014, Nature.

[61]  C. Plomion,et al.  Detection of quantitative trait loci controlling bud burst and height growth in Quercus robur L. , 2004, Theoretical and Applied Genetics.

[62]  E. Eichler,et al.  Limitations of next-generation genome sequence assembly , 2011, Nature Methods.

[63]  H. Hänninen,et al.  Potential for evolutionary responses to climate change – evidence from tree populations , 2013, Global change biology.

[64]  Florent Murat,et al.  Comparative mapping in the Fagaceae and beyond with EST-SSRs , 2012, BMC Plant Biology.

[65]  Christophe Plomion,et al.  LPmerge: an R package for merging genetic maps by linear programming , 2014, Bioinform..

[66]  I. Grosse,et al.  OakContigDF159.1, a reference library for studying differential gene expression in Quercus robur during controlled biotic interactions: use for quantitative transcriptomic profiling of oak roots in ectomycorrhizal symbiosis. , 2013, The New phytologist.

[67]  B. Potts,et al.  Genetic divergence in forest trees: understanding the consequences of climate change , 2014 .

[68]  Y. Vitasse,et al.  Altitudinal differentiation in growth and phenology among populations of temperate-zone tree species growing in a common garden , 2009 .

[69]  H. Hattemer,et al.  Results on quantitative trait loci for flushing date in oaks can be transferred to different segregating progenies. , 2005, Plant biology.

[70]  Thomas Schiex,et al.  Genome Annotation in Plants and Fungi: EuGene as a Model Platform , 2008 .

[71]  S. Jackson,et al.  The First 50 Plant Genomes , 2013 .

[72]  F. Alberto,et al.  Contrasting relations between diversity of candidate genes and variation of bud burst in natural and segregating populations of European oaks , 2010, Heredity.

[73]  O. Brendel,et al.  Comparison of Quantitative Trait Loci for Adaptive Traits Between Oak and Chestnut Based on an Expressed Sequence Tag Consensus Map , 2006, Genetics.

[74]  Le-Shin Wu,et al.  Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies , 2014, Genome Biology.

[75]  M. Hauser,et al.  Transcriptome analysis of bud burst in sessile oak (Quercus petraea). , 2006, The New phytologist.

[76]  M. Caboche,et al.  Efficient cloning of plant genomes into bacterial artificial chromosome (BAC) libraries with larger and more uniform insert size. , 2004, Plant biotechnology journal.

[77]  J. Maguire,et al.  Solution Hybrid Selection with Ultra-long Oligonucleotides for Massively Parallel Targeted Sequencing , 2009, Nature Biotechnology.

[78]  K. Burg,et al.  Identification of adaptation-specific differences in mRNA expression of sessile and pedunculate oak based on osmotic-stress-induced genes. , 2005, Tree physiology.

[79]  J. Rogers,et al.  Pig genome sequence - analysis and publication strategy , 2010, BMC Genomics.

[80]  J. Jurka,et al.  Repbase Update, a database of eukaryotic repetitive elements , 2005, Cytogenetic and Genome Research.