A multi-platform reference for somatic structural variation detection

Accurate detection of somatic structural variation (SV) in cancer genomes remains a challenging problem. This is in part due to the lack of high-quality gold standard datasets that enable the benchmarking of experimental approaches and bioinformatic analysis pipelines for comprehensive somatic SV detection. Here, we approached this challenge by genome-wide somatic SV analysis of the paired melanoma and normal lymphoblastoid COLO829 cell lines using four different technologies: Illumina HiSeq, Oxford Nanopore, Pacific Biosciences and 10x Genomics. Based on the evidence from multiple technologies combined with extensive experimental validation, including Bionano optical mapping data and targeted detection of candidate breakpoint junctions, we compiled a comprehensive set of true somatic SVs, comprising all SV types. We demonstrate the utility of this resource by determining the SV detection performance of each technology as a function of tumor purity and sequence depth, highlighting the importance of assessing these parameters in cancer genomics projects and data analysis tool evaluation. The reference truth somatic SV dataset as well as the underlying raw multi-platform sequencing data are freely available and are an important resource for community somatic benchmarking efforts.

[1]  Michael C. Schatz,et al.  Accurate detection of complex structural variations using single molecule sequencing , 2017, Nature Methods.

[2]  John D McPherson,et al.  Complex rearrangements and oncogene amplifications revealed by long-read DNA and RNA sequencing of a breast cancer cell line , 2017, bioRxiv.

[3]  F. Balloux,et al.  Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast , 2016, Nature Communications.

[4]  Jan O. Korbel,et al.  Single-cell analysis of structural variations and complex rearrangements with tri-channel processing , 2019, Nature Biotechnology.

[5]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[6]  Lovelace J. Luquette,et al.  Diverse Mechanisms of Somatic Structural Variations in Human Cancer Genomes , 2014, Cell.

[7]  Michael C. Schatz,et al.  Interactive analysis and assessment of single-cell copy-number variations , 2015, Nature Methods.

[8]  Jonas Korlach,et al.  Discovery and genotyping of structural variation from long-read haploid genome sequence data , 2017, Genome research.

[9]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[10]  Lovelace J. Luquette,et al.  Diverse Mechanisms of Somatic Structural Variations in Human Cancer Genomes , 2013, Cell.

[11]  Ken Chen,et al.  A robust benchmark for detection of germline large deletions and insertions , 2020, Nature Biotechnology.

[12]  Matthew Meyerson,et al.  CHROMOTHRIPSIS FROM DNA DAMAGE IN MICRONUCLEI , 2015, Nature.

[13]  Bauke Ylstra,et al.  Sequencing Structural Variants in Cancer for Precision Therapeutics. , 2016, Trends in genetics : TIG.

[14]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[15]  J. Martens,et al.  Pan-cancer landscape of homologous recombination deficiency , 2020, Nature Communications.

[16]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[17]  Michael C. Zody,et al.  Deep whole-genome sequencing of 3 cancer cell lines on 2 sequencing platforms , 2019, Scientific Reports.

[18]  P. Park,et al.  Copy number analysis of whole-genome data using BIC-seq2 and its application to detection of cancer susceptibility variants , 2016, Nucleic acids research.

[19]  B. Johansson,et al.  The emerging complexity of gene fusions in cancer , 2015, Nature Reviews Cancer.

[20]  Xiaoyu Chen,et al.  Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications , 2016, Bioinform..

[21]  Deepayan Sarkar,et al.  Single-molecule analysis reveals widespread structural variation in multiple myeloma , 2015, Proceedings of the National Academy of Sciences.

[22]  Hanlee P. Ji,et al.  Linked read sequencing resolves complex genomic rearrangements in gastric cancer metastases , 2017, Genome Medicine.

[23]  Steven J. M. Jones,et al.  A somatic reference standard for cancer genome sequencing , 2016, Scientific Reports.

[24]  Brent S. Pedersen,et al.  SV-plaudit: A cloud-based framework for manually curating thousands of structural variants , 2018, bioRxiv.

[25]  Peter J. Campbell,et al.  Chromothripsis and Kataegis Induced by Telomere Crisis , 2015, Cell.

[26]  Chee Seng Chan,et al.  Comprehensive long-span paired-end-tag mapping reveals characteristic patterns of structural variations in epithelial cancer genomes. , 2011, Genome research.

[27]  Ian T. Fiddes,et al.  Single-cell sequencing of genomic DNA resolves sub-clonal heterogeneity in a melanoma cell line , 2020, Communications Biology.

[28]  B. Faircloth,et al.  Primer3—new capabilities and interfaces , 2012, Nucleic acids research.

[29]  Nuno A. Fonseca,et al.  Patterns of somatic structural variation in human cancer genomes , 2020, Nature.

[30]  Edwin Cuppen,et al.  Sambamba: fast processing of NGS alignment formats , 2015, Bioinform..

[31]  Vanessa M Hayes,et al.  Detection of somatic structural variants from short-read next-generation sequencing data , 2019, bioRxiv.

[32]  James T. Robinson,et al.  Variant Review with the Integrative Genomics Viewer. , 2017, Cancer research.

[33]  Peter H. L. Krijger,et al.  Targeted sequencing by proximity ligation for comprehensive variant detection and local haplotyping , 2014, Nature Biotechnology.

[34]  Satoru Miyano,et al.  Comprehensive analysis of indels in whole-genome microsatellite regions and microsatellite instability across 21 cancer types , 2018, bioRxiv.

[35]  Jan Koster,et al.  Prevalence and clinical implications of chromothripsis in cancer genomes , 2014, Current opinion in oncology.

[36]  Markus J. van Roosmalen,et al.  Chromothripsis is a common mechanism driving genomic rearrangements in primary and metastatic colorectal cancer , 2011, Genome Biology.

[37]  S. Mundlos,et al.  Structural variation in the 3D genome , 2018, Nature Reviews Genetics.

[38]  Cheng Cheng,et al.  The landscape of somatic mutations in Infant MLL rearranged acute lymphoblastic leukemias , 2015, Nature Genetics.

[39]  Sarah H. Johnson,et al.  Neoantigenic Potential of Complex Chromosomal Rearrangements in Mesothelioma , 2019, Journal of thoracic oncology : official publication of the International Association for the Study of Lung Cancer.

[40]  Mark J. P. Chaisson,et al.  Resolving the complexity of the human genome using single-molecule sequencing , 2014, Nature.

[41]  Ryan L. Collins,et al.  Multi-platform discovery of haplotype-resolved structural variation in human genomes , 2017, bioRxiv.

[42]  Edwin Cuppen,et al.  Mapping and phasing of structural variation in patient genomes using nanopore sequencing , 2017, Nature Communications.

[43]  B. Johansson,et al.  The impact of translocations and gene fusions on cancer causation , 2007, Nature Reviews Cancer.

[44]  Ken Chen,et al.  Combining accurate tumor genome simulation with crowdsourcing to benchmark somatic structural variant detection , 2018, bioRxiv.

[45]  Tom Royce,et al.  A comprehensive catalogue of somatic mutations from a human cancer genome , 2010, Nature.

[46]  Andrew Menzies,et al.  Architectures of somatic genomic rearrangement in human cancer amplicons at sequence-level resolution. , 2007, Genome research.

[47]  Xin Zhou,et al.  Pan-cancer genome and transcriptome analyses of 1,699 pediatric leukemias and solid tumors , 2018, Nature.

[48]  Jose Espejo Valle-Inclan,et al.  GRIDSS2: harnessing the power of phasing and single breakends in somatic structural variant detection , 2020, bioRxiv.

[49]  Hanlee P. Ji,et al.  Haplotyping germline and cancer genomes using high-throughput linked-read sequencing , 2015, Nature Biotechnology.

[50]  Michael C. Heinold,et al.  A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing , 2015, Nature Communications.

[51]  Eleazar Eskin,et al.  A comprehensive benchmarking of WGS-based structural variant callers , 2020, bioRxiv.

[52]  Peter J. Park,et al.  Comprehensive analysis of chromothripsis in 2,658 human cancers using whole-genome sequencing , 2018, bioRxiv.