Picky Comprehensively Detects High Resolution Structural Variants in Nanopore Long Reads

Acquired genomic structural variants (SVs) are major hallmarks of cancer genomes, but they are challenging to reconstruct from short-read sequencing data. Here we exploited the long reads of the nanopore platform using our customized pipeline, Picky (https://github.com/TheJacksonLaboratory/Picky), to reveal SVs of diverse architecture in a breast cancer model. We identified the full spectrum of SVs with superior specificity and sensitivity relative to short-read analyses, and uncovered repetitive DNA as the major source of variation. Examination of genome-wide breakpoints at nucleotide resolution uncovered micro-insertions as the common structural features associated with SVs. Breakpoint density across the genome is associated with the propensity for interchromosomal connectivity and was found to be enriched in promoters and transcribed regions of the genome. Furthermore, we observed an over-representation of reciprocal translocations from chromosomal double-crossovers through phased SVs. We demonstrate that Picky analysis is an effective tool for comprehensive detection of SVs in cancer genomes from long-read data.The computational pipeline Picky detects the full spectrum of structural variants and their breakpoints in long nanopore sequence reads.

[1]  Hugh E. Olsen,et al.  The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community , 2016, Genome Biology.

[2]  Bradley P. Coe,et al.  Genome structural variation discovery and genotyping , 2011, Nature Reviews Genetics.

[3]  P. Edwards Fusion genes and chromosome translocations in the common epithelial cancers , 2009, The Journal of pathology.

[4]  Russell E. Durrett,et al.  Assembly and diploid architecture of an individual human genome via single-molecule technologies , 2015, Nature Methods.

[5]  M. L. Le Gros,et al.  Population-based 3D genome structure analysis reveals driving forces in spatial genome organization , 2016, Proceedings of the National Academy of Sciences.

[6]  Michael C. Schatz,et al.  Accurate detection of complex structural variations using single molecule sequencing , 2017 .

[7]  M. Frith,et al.  Adaptive seeds tame genomic sequence comparison. , 2011, Genome research.

[8]  Andrew Menzies,et al.  Architectures of somatic genomic rearrangement in human cancer amplicons at sequence-level resolution. , 2007, Genome research.

[9]  Paul Horton,et al.  Parameters for accurate genome alignment , 2010, BMC Bioinformatics.

[10]  A. Børresen-Dale,et al.  COMPLEX LANDSCAPES OF SOMATIC REARRANGEMENT IN HUMAN BREAST CANCER GENOMES , 2009, Nature.

[11]  Antony V. Cox,et al.  Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing , 2008, Nature Genetics.

[12]  Sharon J. Diskin,et al.  Copy number variation at 1q21.1 associated with neuroblastoma , 2009, Nature.

[13]  Kenny Q. Ye,et al.  Mapping copy number variation by population scale genome sequencing , 2010, Nature.

[14]  Gabor T. Marth,et al.  An integrated map of structural variation in 2,504 human genomes , 2015, Nature.

[15]  Edwin Cuppen,et al.  Mapping and phasing of structural variation in patient genomes using nanopore sequencing , 2017, bioRxiv.

[16]  D. Branton,et al.  Three decades of nanopore sequencing , 2016, Nature Biotechnology.

[17]  Mark J. P. Chaisson,et al.  Resolving the complexity of the human genome using single-molecule sequencing , 2014, Nature.

[18]  Jeffrey H. Chuang,et al.  The tandem duplicator phenotype as a distinct genomic configuration in cancer , 2016, Proceedings of the National Academy of Sciences.

[19]  Heng Li,et al.  Minimap2: fast pairwise alignment for long nucleotide sequences , 2017 .

[20]  M. Hurles,et al.  Large, rare chromosomal deletions associated with severe early-onset obesity , 2010, Nature.

[21]  M. Westerfield,et al.  Characterization of paired tumor and non‐tumor cell lines established from patients with breast cancer , 1998, International journal of cancer.

[22]  Wei-Chung Cheng,et al.  DriverDBv2: a database for human cancer driver gene research , 2015, Nucleic Acids Res..

[23]  Marc L. Salit,et al.  Genome-wide reconstruction of complex structural variants using read clouds , 2016 .

[24]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[25]  Ryan M. Layer,et al.  LUMPY: a probabilistic framework for structural variant discovery , 2012, Genome Biology.

[26]  Niranjan Nagarajan,et al.  Fast and sensitive mapping of nanopore sequencing reads with GraphMap , 2016, Nature Communications.

[27]  Jan O. Korbel,et al.  Phenotypic impact of genomic structural variation: insights from and for human disease , 2013, Nature Reviews Genetics.

[28]  Brent S. Pedersen,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, Nature Biotechnology.

[29]  J. Lupski Structural variation in the human genome. , 2007, The New England journal of medicine.

[30]  A. Pombo,et al.  Intermingling of Chromosome Territories in Interphase Suggests Role in Translocations and Transcription-Dependent Associations , 2006, PLoS biology.

[31]  P. Stankiewicz,et al.  Structural variation in the human genome and its role in disease. , 2010, Annual review of medicine.

[32]  J. Carney,et al.  Mechanisms of eukaryotic DNA double strand break repair. , 2006, Frontiers in bioscience : a journal and virtual library.

[33]  Reza Kalhor,et al.  Genome architectures revealed by tethered chromosome conformation capture and population-based modeling , 2011, Nature Biotechnology.

[34]  Benedict Paten,et al.  Improved data analysis for the MinION nanopore sequencer , 2015, Nature Methods.

[35]  J. McPherson,et al.  Coming of age: ten years of next-generation sequencing technologies , 2016, Nature Reviews Genetics.

[36]  V P Collins,et al.  Array painting reveals a high frequency of balanced translocations in breast cancer cell lines that break in cancer-relevant genes , 2008, Oncogene.