Challenges in studying genomic structural variant formation mechanisms: The short‐read dilemma and beyond

Next‐generation sequencing (NGS) technologies have revolutionised the analysis of genomic structural variants (SVs), providing significant insights into SV de novo formation based on analyses of rearrangement breakpoint junctions. The short DNA reads generated by NGS, however, have also created novel obstacles by biasing the ascertainment of SVs, an aspect that we refer to as the ‘short‐read dilemma’. For example, recent studies have found that SVs are often complex, with SV formation generating large numbers of breakpoints in a single event (multi‐breakpoint SVs) or structurally polymorphic loci having multiple allelic states (multi‐allelic SVs). This complexity may be obscured in short reads, unless the data is analysed and interpreted within its wider genomic context. We discuss how novel approaches will help to overcome the short‐read dilemma, and how integration of other sources of information, including the structure of chromatin, may help in the future to deepen the understanding of SV formation processes.

[1]  F. Alt,et al.  Long-range Oncogenic Activation of IgH/c-myc Translocations by the IgH 3’ Regulatory Region , 2009, Nature.

[2]  Jie Zhang,et al.  Nuclear Receptor-Induced Chromosomal Proximity and DNA Breaks Underlie Specific Translocations in Cancer , 2009, Cell.

[3]  M. Lieber,et al.  The mechanism of double-strand DNA break repair by the nonhomologous DNA end-joining pathway. , 2010, Annual review of biochemistry.

[4]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[5]  Sebastian M. Waszak,et al.  Systematic Inference of Copy-Number Genotypes from Personal Genome Sequencing Data Reveals Extensive Olfactory Receptor Gene Content Diversity , 2010, PLoS Comput. Biol..

[6]  Z. Ning,et al.  Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of GC-biased genomes , 2009, Nature Methods.

[7]  J. Lupski Genomic disorders: structural features of the genome can lead to DNA rearrangements and human disease traits. , 1998, Trends in genetics : TIG.

[8]  Richard M. Clark,et al.  The Rate and Molecular Spectrum of Spontaneous Mutations in Arabidopsis thaliana , 2010, Science.

[9]  Bradley P. Coe,et al.  Genome structural variation discovery and genotyping , 2011, Nature Reviews Genetics.

[10]  D. Valle,et al.  Mobile Interspersed Repeats Are Major Structural Variants in the Human Genome , 2010, Cell.

[11]  F. Alt,et al.  Mechanisms that promote and suppress chromosomal translocations in lymphocytes. , 2011, Annual review of immunology.

[12]  David Jung,et al.  Mechanism and control of V(D)J recombination at the immunoglobulin heavy chain locus. , 2006, Annual review of immunology.

[13]  M. Nachman,et al.  Estimate of the mutation rate per nucleotide in humans. , 2000, Genetics.

[14]  F. Alt,et al.  Mechanisms promoting translocations in editing and switching peripheral B cells , 2009, Nature.

[15]  J. Lupski,et al.  A Microhomology-Mediated Break-Induced Replication Model for the Origin of Human Copy Number Variation , 2009, PLoS genetics.

[16]  Ryan E. Mills,et al.  Which transposable elements are active in the human genome? , 2007, Trends in genetics : TIG.

[17]  Peter H. Sudmant,et al.  Diversity of Human Copy Number Variation and Multicopy Genes , 2010, Science.

[18]  C. E. Pearson,et al.  Repeat instability as the basis for human diseases and as a potential target for therapy , 2010, Nature Reviews Molecular Cell Biology.

[19]  Albert J. Vilella,et al.  Multi-Platform Next-Generation Sequencing of the Domestic Turkey (Meleagris gallopavo): Genome Assembly and Analysis , 2010, PLoS biology.

[20]  R. DePinho,et al.  Connecting chromosomes, crisis, and cancer. , 2002, Science.

[21]  M. McVey,et al.  Synthesis-dependent microhomology-mediated end joining accounts for multiple types of repair junctions , 2010, Nucleic acids research.

[22]  Judy H Cho,et al.  Deletion polymorphism upstream of IRGM associated with altered IRGM expression and Crohn's disease , 2008, Nature Genetics.

[23]  Martin J. Aryee,et al.  Androgen-induced TOP2B mediated double strand breaks and prostate cancer gene rearrangements , 2010, Nature Genetics.

[24]  Philip M. Kim,et al.  Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome , 2007, Science.

[25]  Company profile: Complete Genomics Inc. , 2011, Future oncology.

[26]  P. Sung,et al.  Mechanism of eukaryotic homologous recombination. , 2008, Annual review of biochemistry.

[27]  E. Eichler,et al.  Fine-scale structural variation of the human genome , 2005, Nature Genetics.

[28]  A. Gnirke,et al.  High-quality draft assemblies of mammalian genomes from massively parallel sequence data , 2010, Proceedings of the National Academy of Sciences.

[29]  A. Bacolla,et al.  Non-B DNA structure-induced genetic instability and evolution , 2009, Cellular and Molecular Life Sciences.

[30]  Chee Seng Chan,et al.  Comprehensive long-span paired-end-tag mapping reveals characteristic patterns of structural variations in epithelial cancer genomes. , 2011, Genome research.

[31]  Alexander Eckehart Urban,et al.  in the human genome Systematic prediction and validation of breakpoints associated with copy-number variants , 2007 .

[32]  N. Carter,et al.  Massive Genomic Rearrangement Acquired in a Single Catastrophic Event during Cancer Development , 2011, Cell.

[33]  Ali Bashir,et al.  Structural variation analysis with strobe reads , 2010, Bioinform..

[34]  Kai Ye,et al.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads , 2009, Bioinform..

[35]  J. Lupski,et al.  Retrotransposition and Structural Variation in the Human Genome , 2010, Cell.

[36]  J. Shapiro,et al.  Why repetitive DNA is essential to genome function , 2005, Biological reviews of the Cambridge Philosophical Society.

[37]  T. Shaikh,et al.  Chromosomal instability mediated by non-B DNA: cruciform conformation and not DNA sequence is responsible for recurrent translocation in humans. , 2009, Genome research.

[38]  L. Feuk,et al.  Detection of large-scale variation in the human genome , 2004, Nature Genetics.

[39]  Pablo Huertas Sánchez,et al.  Regulation of DNA double strand break repair pathways , 2013 .

[40]  H. Kazazian,et al.  High-throughput sequencing reveals extensive variation in human-specific L1 content in individual human genomes. , 2010, Genome research.

[41]  Jake K. Byrnes,et al.  Genome-wide association study of copy number variation in 16,000 cases of eight common diseases and 3,000 shared controls , 2010 .

[42]  V. K. Yadav,et al.  Genome-Wide Analyses of Recombination Prone Regions Predict Role of DNA Structural Motif in Recombination , 2009, PloS one.

[43]  S. Varambally,et al.  Induced Chromosomal Proximity and Gene Fusions in Prostate Cancer , 2009, Science.

[44]  E. Eichler,et al.  A Human Genome Structural Variation Sequencing Resource Reveals Insights into Mutational Mechanisms , 2010, Cell.

[45]  Matthew Meyerson,et al.  Cancer Genomes Evolve by Pulverizing Single Chromosomes , 2011, Cell.

[46]  Hugo Y. K. Lam,et al.  Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library , 2010, Nature Biotechnology.

[47]  R. Wells,et al.  Non‐B DNA conformations as determinants of mutagenesis and human disease , 2009, Molecular carcinogenesis.

[48]  P. Shannon,et al.  Analysis of Genetic Inheritance in a Family Quartet by Whole-Genome Sequencing , 2010, Science.

[49]  G. Fink,et al.  Genetic and epigenetic mechanisms underlying cell-surface variability in protozoa and fungi. , 2009, Annual review of genetics.

[50]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[51]  Jake K. Byrnes,et al.  Genome-wide association study of copy number variation in 16,000 cases of eight common diseases and 3,000 shared controls , 2010, Nature.

[52]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[53]  Huanming Yang,et al.  De novo assembly of human genomes with massively parallel short read sequencing. , 2010, Genome research.

[54]  Kim Nasmyth,et al.  Cohesin: its roles and mechanisms. , 2009, Annual review of genetics.

[55]  Kenny Q. Ye,et al.  Large-Scale Copy Number Polymorphism in the Human Genome , 2004, Science.

[56]  Antony V. Cox,et al.  Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing , 2008, Nature Genetics.

[57]  Yehudit Hasin,et al.  High-Resolution Copy-Number Variation Map Reflects Human Olfactory Receptor Diversity and Evolution , 2008, PLoS genetics.

[58]  Jan Komorowski,et al.  Phenotypically concordant and discordant monozygotic twins display different DNA copy-number-variation profiles. , 2008, American journal of human genetics.

[59]  Tomas W. Fitzgerald,et al.  FoSTeS, MMBIR and NAHR at the human proximal Xp region and the mechanisms of human Xq isochromosome formation. , 2011, Human molecular genetics.

[60]  Kevin Brick,et al.  Genome-wide analysis reveals novel molecular features of mouse recombination hotspots , 2011, Nature.

[61]  F. Gage,et al.  LINE-1 retrotransposons: mediators of somatic variation in neuronal genomes? , 2010, Trends in Neurosciences.

[62]  J. Lupski,et al.  A DNA Replication Mechanism for Generating Nonrecurrent Rearrangements Associated with Genomic Disorders , 2007, Cell.

[63]  J. Lupski,et al.  Mechanisms of change in gene copy number , 2009, Nature Reviews Genetics.

[64]  G. V. Ommen Frequency of new copy number variation in humans , 2005, Nature Genetics.

[65]  Kenny Q. Ye,et al.  Mapping copy number variation by population scale genome sequencing , 2010, Nature.

[66]  D. Moazed,et al.  The nuclear envelope in genome organization, expression and stability , 2010, Nature Reviews Molecular Cell Biology.

[67]  Gary D Bader,et al.  Functional impact of global rare copy number variation in autism spectrum disorders , 2010, Nature.

[68]  Evan E. Eichler,et al.  LINE-1 Retrotransposition Activity in Human Genomes , 2010, Cell.

[69]  Tom Misteli,et al.  Spatial proximity of translocation-prone gene loci in human lymphomas , 2003, Nature Genetics.

[70]  Misko Dzamba,et al.  Detecting copy number variation with mated short reads. , 2010, Genome research.

[71]  Andrew Menzies,et al.  Architectures of somatic genomic rearrangement in human cancer amplicons at sequence-level resolution. , 2007, Genome research.

[72]  M. Hurles,et al.  Copy number variation in human health, disease, and evolution. , 2009, Annual review of genomics and human genetics.

[73]  Eric S. Lander,et al.  The genomic complexity of primary human prostate cancer , 2010, Nature.

[74]  P. Flicek,et al.  Molecular maps of the reorganization of genome-nuclear lamina interactions during differentiation. , 2010, Molecular cell.

[75]  Stefan R. Henz,et al.  Reference-guided assembly of four diverse Arabidopsis thaliana genomes , 2011, Proceedings of the National Academy of Sciences.

[76]  Life Technologies,et al.  A map of human genome variation from population-scale sequencing , 2011 .

[77]  J. Lupski,et al.  Mechanisms for human genomic rearrangements , 2008, PathoGenetics.

[78]  Ira M. Hall,et al.  Genome-wide mapping and assembly of structural variant breakpoints in the mouse genome. , 2010, Genome research.

[79]  Jessica R. Wolff,et al.  Microduplications of 16p11.2 are Associated with Schizophrenia , 2009, Nature Genetics.

[80]  W. Heyer,et al.  Regulation of homologous recombination in eukaryotes. , 2010, Annual review of genetics.

[81]  L. Mularoni,et al.  INAUGURAL ARTICLE by a Recently Elected Academy Member:DNA transposon Hermes inserts into DNA in nucleosome-free regions in vivo , 2010 .

[82]  E. Blackburn,et al.  Telomerase and ATM/Tel1p protect telomeres from nonhomologous end joining. , 2003, Molecular cell.

[83]  N. Carter,et al.  Germline rates of de novo meiotic deletions and duplications causing several genomic disorders , 2008, Nature Genetics.

[84]  A. Musio,et al.  The expanding universe of cohesin functions: a new genome stability caretaker involved in human disease and cancer , 2010, Human mutation.

[85]  Andrew F. Neuwald,et al.  Natural Mutagenesis of Human Genomes by Endogenous Retrotransposons , 2010, Cell.

[86]  David N Cooper,et al.  Breakpoints of gross deletions coincide with non-B DNA conformations. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[87]  J. Kitzman,et al.  Personalized Copy-Number and Segmental Duplication Maps using Next-Generation Sequencing , 2009, Nature Genetics.

[88]  J. Nickoloff,et al.  Regulation of DNA double-strand break repair pathway choice , 2008, Cell Research.

[89]  Yu Wang,et al.  A recurrent 15q13.3 microdeletion syndrome associated with mental retardation and seizures , 2008, Nature Genetics.

[90]  J. Lupski,et al.  Mechanisms for nonrecurrent genomic rearrangements associated with CMT1A or HNPP: rare CNVs as a cause for missing heritability. , 2010, American journal of human genetics.

[91]  E. Bertolino,et al.  Transcriptional repression mediated by repositioning of genes to the nuclear lamina , 2008, Nature.

[92]  Joshua M. Korn,et al.  Mapping and sequencing of structural variation from eight human genomes , 2008, Nature.

[93]  Markus J. van Roosmalen,et al.  Chromothripsis as a mechanism driving complex de novo structural rearrangements in the germline. , 2011, Human molecular genetics.

[94]  E. Eichler,et al.  Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions , 2010, Nature Methods.

[95]  M. Lieber,et al.  DNA structures at chromosomal translocation sites. , 2006, BioEssays : news and reviews in molecular, cellular and developmental biology.

[96]  Guliang Wang,et al.  Non-B DNA structure-induced genetic instability. , 2006, Mutation research.

[97]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[98]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[99]  B. Mcclintock,et al.  The Fusion of Broken Ends of Chromosomes Following Nuclear Fusion. , 1942, Proceedings of the National Academy of Sciences of the United States of America.

[100]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[101]  M. J. Neale,et al.  Initiation of meiotic recombination by formation of DNA double-strand breaks: mechanism and regulation. , 2006, Biochemical Society transactions.

[102]  T. Fennell,et al.  Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries , 2011, Genome Biology.

[103]  Benjamin P. Blackburne,et al.  Mutation spectrum revealed by breakpoint sequencing of human germline CNVs , 2010, Nature Genetics.

[104]  G. Parmigiani,et al.  Chromatid cohesion defects may underlie chromosome instability in human colorectal cancers , 2008, Proceedings of the National Academy of Sciences.

[105]  A. Strunnikov One-hit wonders of genomic instability , 2010, Cell Division.

[106]  M. Gerstein,et al.  The genetic architecture of Down syndrome phenotypes revealed by high-resolution analysis of human segmental trisomies , 2009, Proceedings of the National Academy of Sciences.

[107]  Kenny Q. Ye,et al.  Sensitive and accurate detection of copy number variants using read depth of coverage. , 2009, Genome research.

[108]  Alexey S Kondrashov,et al.  Direct estimates of human per nucleotide mutation rates at 20 loci causing mendelian diseases , 2003, Human mutation.

[109]  Tomas W. Fitzgerald,et al.  Origins and functional impact of copy number variation in the human genome , 2010, Nature.

[110]  J. Lupski,et al.  Genomic rearrangements and sporadic disease , 2007, Nature Genetics.

[111]  P. Stankiewicz,et al.  Structural variation in the human genome and its role in disease. , 2010, Annual review of medicine.

[112]  Huanming Yang,et al.  Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly , 2011, Nature Biotechnology.

[113]  M. McVey,et al.  MMEJ repair of double-strand breaks (director's cut): deleted sequences and alternative endings. , 2008, Trends in genetics : TIG.

[114]  Rainer Machné,et al.  Evolutionary footprints of nucleosome positions in yeast. , 2008, Trends in genetics : TIG.