Sequence of the Sugar Pine Megagenome

Until very recently, complete characterization of the megagenomes of conifers has remained elusive. The diploid genome of sugar pine (Pinus lambertiana Dougl.) has a highly repetitive, 31 billion bp genome. It is the largest genome sequenced and assembled to date, and the first from the subgenus Strobus, or white pines, a group that is notable for having the largest genomes among the pines. The genome represents a unique opportunity to investigate genome “obesity” in conifers and white pines. Comparative analysis of P. lambertiana and P. taeda L. reveals new insights on the conservation, age, and diversity of the highly abundant transposable elements, the primary factor determining genome size. Like most North American white pines, the principal pathogen of P. lambertiana is white pine blister rust (Cronartium ribicola J.C. Fischer ex Raben.). Identification of candidate genes for resistance to this pathogen is of great ecological importance. The genome sequence afforded us the opportunity to make substantial progress on locating the major dominant gene for simple resistance hypersensitive response, Cr1. We describe new markers and gene annotation that are both tightly linked to Cr1 in a mapping population, and associated with Cr1 in unrelated sugar pine individuals sampled throughout the species’ range, creating a solid foundation for future mapping. This genomic variation and annotated candidate genes characterized in our study of the Cr1 region are resources for future marker-assisted breeding efforts as well as for investigations of fundamental mechanisms of invasive disease and evolutionary response.

[1]  S. Wright Evolution of Genome Size , 2017 .

[2]  D. Neale,et al.  Assessing the Gene Content of the Megagenome: Sugar Pine (Pinus lambertiana) , 2016, G3: Genes, Genomes, Genetics.

[3]  Erich Bornberg-Bauer,et al.  DOGMA: domain-based transcriptome and proteome quality assessment , 2016, Bioinform..

[4]  Patterns of neutral and adaptive genetic diversity across the natural range of sugar pine (Pinus lambertiana Dougl.) , 2016, Tree Genetics & Genomes.

[5]  J. L. Vázquez-Poletti,et al.  RNA-seq analysis in forest tree species: bioinformatic problems and solutions , 2016, Tree Genetics & Genomes.

[6]  S. Korban,et al.  Construction of a high density linkage map and its application in the identification of QTLs for soluble sugar and organic acid components in apple , 2015, Tree Genetics & Genomes.

[7]  Michael S. Barker,et al.  Early genome duplications in conifers and other seed plants , 2015, Science Advances.

[8]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[9]  M. Morgante,et al.  The Ty1-copia LTR retroelement family PARTC is highly conserved in conifers over 200 MY of evolution. , 2015, Gene.

[10]  Steven J. M. Jones,et al.  Improved white spruce (Picea glauca) genome assemblies and annotation of large gene families of conifer terpenoid and phenolic defense metabolism. , 2015, The Plant journal : for cell and molecular biology.

[11]  James A. Yorke,et al.  QuorUM: An Error Corrector for Illumina Reads , 2013, PloS one.

[12]  Le-Shin Wu,et al.  Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies , 2014, Genome Biology.

[13]  S. Salzberg,et al.  Sequencing and Assembly of the 22-Gb Loblolly Pine Genome , 2014, Genetics.

[14]  Hans A. Vasquez-Gross,et al.  Unique Features of the Loblolly Pine (Pinus taeda L.) Megagenome Revealed Through Sequence Annotation , 2014, Genetics.

[15]  N. Rosenberg,et al.  An empirical evaluation of two-stage species tree inference strategies using a multilocus dataset from North American pines , 2014, BMC Evolutionary Biology.

[16]  Carolyn J. Lawrence-Dill,et al.  MAKER-P: A Tool Kit for the Rapid Creation, Management, and Quality Control of Plant Genome Annotations1[W][OPEN] , 2013, Plant Physiology.

[17]  A. Farjon,et al.  An Atlas of the World's Conifers: An Analysis of Their Distribution, Biogeography, Diversity, and Conservation Status , 2013 .

[18]  Michael Roberts,et al.  The MaSuRCA genome assembler , 2013, Bioinform..

[19]  D. Neale,et al.  Multilocus analyses reveal little evidence for lineage‐wide adaptive evolution within major clades of soft pines (Pinus subgenus Strobus) , 2013, Molecular ecology.

[20]  D. Neale,et al.  The Evolutionary Genetics of the Genes Underlying Phenotypic Associations for Loblolly Pine (Pinus taeda, Pinaceae) , 2013, Genetics.

[21]  Jill L. Wegrzyn,et al.  Insights into the Loblolly Pine Genome: Characterization of BAC and Fosmid Sequences , 2013, PloS one.

[22]  Colin N. Dewey,et al.  De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis , 2013, Nature Protocols.

[23]  Douglas G. Scofield,et al.  The Norway spruce genome sequence and conifer genome evolution , 2013, Nature.

[24]  Inanç Birol,et al.  Assembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data , 2013, Bioinform..

[25]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[26]  Priyanka Bhardwaj,et al.  Miniature inverted-repeat transposable elements: discovery, distribution, and activity. , 2013, Genome.

[27]  A. Farjon,et al.  Creating the Atlas of the World’s Conifers , 2013 .

[28]  N. Lennon,et al.  Characterizing and measuring bias in sequence data , 2013, Genome Biology.

[29]  Jian Wang,et al.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler , 2012, GigaScience.

[30]  R. Keane,et al.  A range-wide restoration strategy for whitebark pine (Pinus albicaulis) , 2012 .

[31]  D. Neale,et al.  Population biology of sugar pine ( Pinus lambertiana Dougl.) with reference to historical disturbanc , 2011 .

[32]  B. Gaut,et al.  Genome Size and Transposable Element Content as Determined by High-Throughput Sequencing in Maize and Zea luxurians , 2011, Genome biology and evolution.

[33]  D. Neale,et al.  Comparative mapping in Pinus: sugar pine (Pinus lambertiana Dougl.) and loblolly pine (Pinus taeda L.) , 2011, Tree Genetics & Genomes.

[34]  A. Gnirke,et al.  High-quality draft assemblies of mammalian genomes from massively parallel sequence data , 2010, Proceedings of the National Academy of Sciences.

[35]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[36]  S. Moretti,et al.  Summary of the , 2010 .

[37]  György Abrusán,et al.  TEclass - a tool for automated classification of unknown eukaryotic transposable elements , 2009, Bioinform..

[38]  J. Jurka,et al.  New superfamilies of eukaryotic DNA transposons and their internal divisions. , 2009, Molecular biology and evolution.

[39]  J. Carlson,et al.  Evolution of Genome Size and Complexity in Pinus , 2009, PloS one.

[40]  Stefan Kurtz,et al.  LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons , 2008, BMC Bioinformatics.

[41]  J. Bennetzen,et al.  A unified classification system for eukaryotic transposable elements , 2007, Nature Reviews Genetics.

[42]  A. Liston,et al.  Fossil calibration of molecular divergence infers a moderate mutation rate and recent radiations for pinus. , 2006, Molecular biology and evolution.

[43]  Jerzy Jurka,et al.  Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor , 2006, BMC Bioinformatics.

[44]  Rod A Wing,et al.  Differential lineage-specific amplification of transposable elements is responsible for genome size variation in Gossypium. , 2006, Genome research.

[45]  S. Jackson,et al.  Doubling genome size without polyploidization: dynamics of retrotransposition-driven genomic expansions in Oryza australiensis, a wild relative of rice. , 2006, Genome research.

[46]  R. Hunt,et al.  Identification and Characterization of Random Amplified Polymorphic DNA Markers Linked to a Major Gene (Cr2) for Resistance to Cronartium ribicola in Pinus monticola. , 2006, Phytopathology.

[47]  Sagi Snir,et al.  Maximum likelihood Jukes-Cantor triplets: analytic solutions. , 2005, Molecular biology and evolution.

[48]  M. R. Ahuja,et al.  Evolution of Genome Size in Conifers , 2005 .

[49]  J. Jurka,et al.  Repbase Update, a database of eukaryotic repetitive elements , 2005, Cytogenetic and Genome Research.

[50]  Thomas D. Wu,et al.  GMAP: a genomic mapping and alignment program for mRNA and EST sequence , 2005, Bioinform..

[51]  A. Liston,et al.  Phylogeny and classification of Pinus , 2005 .

[52]  Pavel A. Pevzner,et al.  De novo identification of repeat families in large genomes , 2005, ISMB.

[53]  Michael J. Sanderson,et al.  EVOLUTION OF GENOME SIZE IN PINES (PINUS) AND ITS LIFE‐HISTORY CORRELATES: SUPERTREE ANALYSES , 2004, Evolution; international journal of organic evolution.

[54]  Nansheng Chen,et al.  Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences , 2009, Current protocols in bioinformatics.

[55]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[56]  M. Sanderson,et al.  EVOLUTION OF GENOME SIZE IN PINES (PINUS) AND ITS LIFE-HISTORY CORRELATES: SUPERTREE ANALYSES , 2004 .

[57]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[58]  B. Kinloch,et al.  White pine blister rust in north america: past and prognosis. , 2003, Phytopathology.

[59]  S. Eddy,et al.  Automated de novo identification of repeat sequence families in sequenced genomes. , 2002, Genome research.

[60]  Spencer J Johnston,et al.  Genomic consequences of interspecific Pinus spp. hybridization , 2002 .

[61]  R. Schmid,et al.  Forest Giants of the Pacific Coast , 2001 .

[62]  S Rozen,et al.  Primer3 on the WWW for general users and for biologist programmers. , 2000, Methods in molecular biology.

[63]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[64]  P. A. Skaggs,et al.  Saturation mapping of a major gene for resistance to white pine blister rust in sugar pine , 1998, Theoretical and Applied Genetics.

[65]  Phillip SanMiguel,et al.  The paleontology of intergene retrotransposons of maize , 1998, Nature Genetics.

[66]  D. Richardson,et al.  Ecology and biogeography of Pinus: an introduction , 1998 .

[67]  M. Devey,et al.  Random amplified polymorphic DNA markers tightly linked to a gene for resistance to white pine blister rust in sugar pine. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[68]  R. Newton,et al.  Genome size and environmental factors in the genus Pinus , 1993 .

[69]  F. Ausubel,et al.  A procedure for mapping Arabidopsis mutations using co-dominant ecotype-specific PCR-based markers. , 1993, The Plant journal : for cell and molecular biology.

[70]  B. Kinloch Distribution and frequency of a gene for resistance to white pine blister rust in natural populations of sugar pine , 1992 .

[71]  Diana F. Tomback,et al.  Dispersal of Whitebark Pine Seeds by Clark's Nutcracker: A Mutualism Hypothesis , 1982 .

[72]  F. Crick,et al.  Selfish DNA: the ultimate parasite , 1980, Nature.

[73]  Gaylord K. Parks,et al.  White Pine Blister Rust: Simply Inherited Resistance in Sugar Pine , 1970, Science.