Genomic Rearrangements in Arabidopsis Considered as Quantitative Traits

Structural Rearrangements can have unexpected effects on quantitative phenotypes. Surprisingly, these rearrangements can also be considered as... To understand the population genetics of structural variants and their effects on phenotypes, we developed an approach to mapping structural variants that segregate in a population sequenced at low coverage. We avoid calling structural variants directly. Instead, the evidence for a potential structural variant at a locus is indicated by variation in the counts of short-reads that map anomalously to that locus. These structural variant traits are treated as quantitative traits and mapped genetically, analogously to a gene expression study. Association between a structural variant trait at one locus, and genotypes at a distant locus indicate the origin and target of a transposition. Using ultra-low-coverage (0.3×) population sequence data from 488 recombinant inbred Arabidopsis thaliana genomes, we identified 6502 segregating structural variants. Remarkably, 25% of these were transpositions. While many structural variants cannot be delineated precisely, we validated 83% of 44 predicted transposition breakpoints by polymerase chain reaction. We show that specific structural variants may be causative for quantitative trait loci for germination and resistance to infection by the fungus Albugo laibachii, isolate Nc14. Further we show that the phenotypic heritability attributable to read-mapping anomalies differs from, and, in the case of time to germination and bolting, exceeds that due to standard genetic variation. Genes within structural variants are also more likely to be silenced or dysregulated. This approach complements the prevalent strategy of structural variant discovery in fewer individuals sequenced at high coverage. It is generally applicable to large populations sequenced at low-coverage, and is particularly suited to mapping transpositions.

[1]  J. Landolin,et al.  Assembling large genomes with single-molecule sequencing and locality-sensitive hashing , 2014, Nature Biotechnology.

[2]  M. Lathrop,et al.  Serial translocation by means of circular intermediates underlies colour sidedness in cattle , 2012, Nature.

[3]  Simon Myers,et al.  Rapid genotype imputation from sequence without reference panels , 2016, Nature Genetics.

[4]  Alkes L. Price,et al.  Using population admixture to help complete maps of the human genome , 2013, Nature Genetics.

[5]  Jun Wang,et al.  The 3,000 rice genomes project: new opportunities and challenges for future rice research , 2014, GigaScience.

[6]  Vipin T. Sreedharan,et al.  Multiple reference genomes and transcriptomes for Arabidopsis thaliana , 2011, Nature.

[7]  C. Dean,et al.  Integrated Cytogenetic Map of Chromosome Arm 4S of A. thaliana Structural Organization of Heterochromatic Knob and Centromere Region , 2000, Cell.

[8]  Thomas Zichner,et al.  DELLY: structural variant discovery by integrated paired-end and split-read analysis , 2012, Bioinform..

[9]  Kai Ye,et al.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads , 2009, Bioinform..

[10]  Ryan M. Layer,et al.  LUMPY: a probabilistic framework for structural variant discovery , 2012, Genome Biology.

[11]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[12]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[13]  Karsten M. Borgwardt,et al.  Whole-genome sequencing of multiple Arabidopsis thaliana populations , 2011, Nature Genetics.

[14]  Richard M. Clark,et al.  The Arabidopsis lyrata genome sequence and the basis of rapid genome size change , 2011, Nature Genetics.

[15]  G. Mayhew,et al.  The Arabidopsis thaliana mobilome and its impact at the species level , 2016, eLife.

[16]  Kenny Q. Ye,et al.  Mapping copy number variation by population scale genome sequencing , 2010, Nature.

[17]  Mark Yandell,et al.  Wham: Identifying Structural Variants of Biological Consequence , 2015, PLoS Comput. Biol..

[18]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[19]  Warren W. Kretzschmar,et al.  Sparse whole genome sequencing identifies two loci for major depressive disorder , 2015, Nature.

[20]  Vipin T. Sreedharan,et al.  RNA‐Seq Read Alignments with PALMapper , 2010, Current protocols in bioinformatics.

[21]  Benjamin J. Raphael,et al.  An integrative probabilistic model for identification of structural variation in sequencing data , 2012, Genome Biology.

[22]  Chun Jimmie Ye,et al.  Accurate Discovery of Expression Quantitative Trait Loci Under Confounding From Spurious and Genuine Regulatory Hotspots , 2008, Genetics.

[23]  Jonathan D. G. Jones,et al.  Gene Gain and Loss during Evolution of Obligate Parasitism in the White Rust Pathogen of Arabidopsis thaliana , 2011, PLoS biology.

[24]  Frank Dudbridge,et al.  Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies. , 2004, American journal of human genetics.

[25]  M. Pop,et al.  The Theory and Practice of Genome Sequence Assembly. , 2015, Annual review of genomics and human genetics.

[26]  Thomas M. Keane,et al.  Sequence-based characterization of structural variation in the mouse genome , 2011, Nature.

[27]  G. McVean,et al.  Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications , 2014, Nature Genetics.

[28]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[29]  N. Hastie,et al.  Uncovering Networks from Genome-Wide Association Studies via Circular Genomic Permutation , 2012, G3: Genes | Genomes | Genetics.

[30]  Karsten M. Borgwardt,et al.  1,135 Genomes Reveal the Global Pattern of Polymorphism in Arabidopsis thaliana , 2016, Cell.

[31]  Hankuil Yi,et al.  Gene Duplication and Hypermutation of the Pathogen Resistance Gene SNC1 in the Arabidopsis bal Variant , 2009, Genetics.

[32]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[33]  Kazuo Shinozaki,et al.  ABA hypersensitive germination2-1 causes the activation of both abscisic acid and salicylic acid responses in Arabidopsis. , 2009, Plant & cell physiology.

[34]  Glenn Tesler,et al.  Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory , 2012, BMC Bioinformatics.

[35]  B. Mueller‐Roeber,et al.  Positional Information Resolves Structural Variations and Uncovers an Evolutionarily Divergent Genetic Locus in Accessions of Arabidopsis thaliana , 2011, Genome biology and evolution.

[36]  Benedict Paten,et al.  Improved data analysis for the MinION nanopore sequencer , 2015, Nature Methods.

[37]  T. Kakutani,et al.  Control of transposable elements in Arabidopsis thaliana , 2014, Chromosome Research.

[38]  P. Kover,et al.  Plant responses to elevated temperatures: a field study on phenological sensitivity and fitness responses to simulated climate warming , 2013, Global change biology.

[39]  Leonard McMillan,et al.  High-Resolution Genetic Mapping Using the Mouse Diversity Outbred Population , 2012, Genetics.

[40]  R. Wilson,et al.  BreakDancer: An algorithm for high resolution mapping of genomic structural variation , 2009, Nature Methods.

[41]  Steve D. M. Brown,et al.  Genome-wide association of multiple complex traits in outbred mice by ultra low-coverage sequencing , 2016, Nature Genetics.

[42]  Jared T. Simpson,et al.  Copy number variant detection in inbred strains from short read sequence data , 2009, Bioinform..

[43]  R. Mott,et al.  Unstable Inheritance of 45S rRNA Genes in Arabidopsis thaliana , 2016, G3: Genes, Genomes, Genetics.

[44]  D. Heckerman,et al.  Efficient Control of Population Structure in Model Organism Association Mapping , 2008, Genetics.

[45]  S Rozen,et al.  Primer3 on the WWW for general users and for biologist programmers. , 2000, Methods in molecular biology.

[46]  Heinrich Magnus Manske,et al.  LookSeq: a browser-based viewer for deep sequencing data. , 2009, Genome research.

[47]  M. Thines,et al.  A new species of Albugo parasitic to Arabidopsis thaliana reveals new evolutionary patterns in white blister rusts (Albuginaceae) , 2009, Persoonia.

[48]  Martin Goodson,et al.  Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. , 2011, Genome research.

[49]  I. Hellmann,et al.  Massive genomic variation and strong selection in Arabidopsis thaliana lines from Sweden , 2013, Nature Genetics.

[50]  Jonathan D. G. Jones,et al.  Arabidopsis RPP4 is a member of the RPP5 multigene family of TIR-NB-LRR genes and confers downy mildew resistance through multiple signalling components. , 2002, The Plant journal : for cell and molecular biology.

[51]  Life Technologies,et al.  A map of human genome variation from population-scale sequencing , 2011 .

[52]  R. Mott,et al.  A Multiparent Advanced Generation Inter-Cross to Fine-Map Quantitative Traits in Arabidopsis thaliana , 2009, PLoS genetics.

[53]  D. Weigel,et al.  The genomic landscape of meiotic crossovers and gene conversions in Arabidopsis thaliana , 2013, eLife.