Complex rearrangements and oncogene amplifications revealed by long-read DNA and RNA sequencing of a breast cancer cell line

The SK-BR-3 cell line is one of the most important models for HER2+ breast cancers, which affect one in five breast cancer patients. SK-BR-3 is known to be highly rearranged, although much of the variation is in complex and repetitive regions that may be underreported. Addressing this, we sequenced SK-BR-3 using long-read single molecule sequencing from Pacific Biosciences and develop one of the most detailed maps of structural variations (SVs) in a cancer genome available, with nearly 20,000 variants present, most of which were missed by short-read sequencing. Surrounding the important ERBB2 oncogene (also known as HER2), we discover a complex sequence of nested duplications and translocations, suggesting a punctuated progression. Full-length transcriptome sequencing further revealed several novel gene fusions within the nested genomic variants. Combining long-read genome and transcriptome sequencing enables an in-depth analysis of how SVs disrupt the genome and sheds new light on the complex mechanisms involved in cancer genome evolution.

[1]  Xiandong Meng,et al.  Widespread Polycistronic Transcripts in Fungi Revealed by Single-Molecule mRNA Sequencing , 2015, PloS one.

[2]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[3]  Xiaoyu Chen,et al.  Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications , 2016, Bioinform..

[4]  Adam P Butler,et al.  Association of a germline copy number polymorphism of APOBEC3A and APOBEC3B with burden of putative APOBEC-dependent mutations in breast cancer , 2014, Nature Genetics.

[5]  Michael C. Schatz,et al.  Accurate detection of complex structural variations using single molecule sequencing , 2017, Nature Methods.

[6]  James O J Davies,et al.  How best to identify chromosomal interactions: a comparison of approaches , 2017, Nature Methods.

[7]  Gabor T. Marth,et al.  An integrated map of structural variation in 2,504 human genomes , 2015, Nature.

[8]  M. Kimura The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. , 1969, Genetics.

[9]  G. Tsujimoto,et al.  Trastuzumab Produces Therapeutic Actions by Upregulating miR-26a and miR-30b in Breast Cancer Cells , 2012, PloS one.

[10]  P. A. Futreal,et al.  Emerging patterns of somatic mutations in cancer , 2013, Nature Reviews Genetics.

[11]  B. Johansson,et al.  The impact of translocations and gene fusions on cancer causation , 2007, Nature Reviews Cancer.

[12]  F. Balloux,et al.  Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast , 2016, Nature Communications.

[13]  R. Wilson,et al.  BreakTrans: uncovering the genomic architecture of gene fusions , 2013, Genome Biology.

[14]  Jonas Korlach,et al.  Discovery and genotyping of structural variation from long-read haploid genome sequence data , 2017, Genome research.

[15]  Hanlee P. Ji,et al.  Automatic detection of complex structural genome variation across world populations , 2017, bioRxiv.

[16]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[17]  Michael C. Schatz,et al.  Ribbon: Visualizing complex genome alignments and structural variation , 2016, bioRxiv.

[18]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[19]  A. Gnirke,et al.  High-quality draft assemblies of mammalian genomes from massively parallel sequence data , 2010, Proceedings of the National Academy of Sciences.

[20]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[21]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[22]  A. Børresen-Dale,et al.  The Life History of 21 Breast Cancers , 2012, Cell.

[23]  L. Liau,et al.  Cancer-associated IDH1 mutations produce 2-hydroxyglutarate , 2010, Nature.

[24]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[25]  J. Troge,et al.  Tumour evolution inferred by single-cell sequencing , 2011, Nature.

[26]  Wen-Lin Kuo,et al.  A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. , 2006, Cancer cell.

[27]  D. Hanahan,et al.  Hallmarks of Cancer: The Next Generation , 2011, Cell.

[28]  Ben Busby,et al.  DangerTrack: A scoring system to detect difficult-to-assess regions , 2017, F1000Research.

[29]  M. Schatz,et al.  Phased diploid genome assembly with single-molecule real-time sequencing , 2016, Nature Methods.

[30]  Mark J. P. Chaisson,et al.  Resolving the complexity of the human genome using single-molecule sequencing , 2014, Nature.

[31]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[32]  Amy E. Hawkins,et al.  DNA sequencing of a cytogenetically normal acute myeloid leukemia genome , 2008, Nature.

[33]  Thomas D. Wu,et al.  GMAP and GSNAP for Genomic Sequence Alignment: Enhancements to Speed, Accuracy, and Functionality , 2016, Statistical Genomics.

[34]  Russell E. Durrett,et al.  Assembly and diploid architecture of an individual human genome via single-molecule technologies , 2015, Nature Methods.

[35]  Ryan E. Mills,et al.  Resolving complex structural genomic rearrangements using a randomized approach , 2016, Genome Biology.

[36]  S. Turner,et al.  Real-Time DNA Sequencing from Single Polymerase Molecules , 2009, Science.

[37]  John M Lambert,et al.  Targeting HER2-positive breast cancer with trastuzumab-DM1, an antibody-cytotoxic drug conjugate. , 2008, Cancer research.

[38]  S. Salzberg,et al.  TopHat-Fusion: an algorithm for discovery of novel fusion transcripts , 2011, Genome Biology.

[39]  A. Børresen-Dale,et al.  Identification of fusion genes in breast cancer by paired-end RNA-sequencing , 2011, Genome Biology.

[40]  Donald Sharon,et al.  A single-molecule long-read survey of the human transcriptome , 2013, Nature Biotechnology.

[41]  Michael C. Schatz,et al.  Assemblytics: a web analytics tool for the detection of variants from an assembly , 2016, Bioinform..

[42]  Michael C. Schatz,et al.  Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score , 2012, Bioinform..

[43]  Páll Melsted,et al.  PopIns: population-scale detection of novel sequence insertions , 2015, Bioinform..

[44]  M. Schatz,et al.  SplitThreader: Exploration and analysis of rearrangements in cancer genomes , 2016, bioRxiv.

[45]  Krishna R. Kalari,et al.  A novel bioinformatics pipeline for identification and characterization of fusion transcripts in breast cancer and normal cell lines , 2011, Nucleic acids research.

[46]  Tyson A. Clark,et al.  Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing , 2016, Nature Communications.

[47]  T. Hubbard,et al.  A census of human cancer genes , 2004, Nature Reviews Cancer.

[48]  Thomas D. Wu,et al.  GMAP: a genomic mapping and alignment program for mRNA and EST sequence , 2005, Bioinform..

[49]  Benjamin J. Raphael,et al.  Mutational landscape and significance across 12 major cancer types , 2013, Nature.

[50]  Gary D Bader,et al.  International network of cancer genome projects , 2010, Nature.

[51]  Thomas Zichner,et al.  DELLY: structural variant discovery by integrated paired-end and split-read analysis , 2012, Bioinform..

[52]  Chee Seng Chan,et al.  Comprehensive long-span paired-end-tag mapping reveals characteristic patterns of structural variations in epithelial cancer genomes. , 2011, Genome research.

[53]  Ryan M. Layer,et al.  LUMPY: a probabilistic framework for structural variant discovery , 2012, Genome Biology.

[54]  Tyson A. Clark,et al.  Characterization of fusion genes and the significantly expressed fusion isoforms in breast cancer by hybrid sequencing , 2015, Nucleic acids research.

[55]  J. Korlach,et al.  De novo assembly and phasing of a Korean human genome , 2016, Nature.