Discovery of large genomic inversions using pooled clone sequencing

Motivation There are many different forms of genomic structural variation that can be broadly classified as copy number variation (CNV) and balanced rearrangements. Although many algorithms are now available in the literature that aim to characterize CNVs, discovery of balanced rearrangements (inversions and translocations) remains an open problem. This is mainly because the breakpoints of such events typically lie within segmental duplications and common repeats, which reduce the mappability of short reads. The 1000 Genomes Project spearheaded the development of several methods to identify inversions, however, they are limited to relatively short inversions, and there are currently no available algorithms to discover large inversions using high throughput sequencing technologies (HTS). Results Here we propose to use a sequencing method (Kitzman et al., 2011) originally developed to improve haplotype resolution to characterize large genomic inversions. This method, called pooled clone sequencing, merges the advantages of clone based sequencing approach with the speed and cost efficiency of HTS technologies. Using data generated with pooled clone sequencing method, we developed a novel algorithm, dipSeq, to discover large inversions (>500 Kbp). We show the power of dipSeq first on simulated data, and then apply it to the genome of a HapMap individual (NA12878). We were able to accurately discover all previously known and experimentally validated large inversions in the same genome. We also identified a novel inversion, and confirmed using fluorescent in situ hybridization. Availability Implementation of the dipSeq algorithm is available at https://github.com/BilkentCompGen/dipseq Contact calkan@cs.bilkent.edu.tr, francesca.antonacci@uniba.it

[1]  Mark J. P. Chaisson,et al.  Resolving the complexity of the human genome using single-molecule sequencing , 2014, Nature.

[2]  Peter H. Sudmant,et al.  Palindromic GOLGA8 core duplicons promote chromosome 15q13.3 microdeletion and evolutionary instability , 2014, Nature Genetics.

[3]  Dmitry Pushkarev,et al.  Whole-genome haplotyping using long reads and statistical methods , 2014, Nature Biotechnology.

[4]  G. Weinstock,et al.  TIGRA: A targeted iterative graph routing assembler for breakpoint assembly , 2014, Genome research.

[5]  Lorena Pantano,et al.  InvFEST, a database integrating information of polymorphic inversions in the human genome , 2013, Nucleic Acids Res..

[6]  Onur Mutlu,et al.  Accelerating read mapping with FastHASH , 2013, BMC Genomics.

[7]  Ryan M. Layer,et al.  LUMPY: a probabilistic framework for structural variant discovery , 2012, Genome Biology.

[8]  Thomas Zichner,et al.  DELLY: structural variant discovery by integrated paired-end and split-read analysis , 2012, Bioinform..

[9]  Kenneth K. Kidd,et al.  Structural Diversity and African Origin of the 17q21.31 Inversion Polymorphism , 2012, Nature Genetics.

[10]  Jessica C. Ebert,et al.  Accurate whole genome sequencing and haplotyping from10-20 human cells , 2012, Nature.

[11]  Dario Strbenac,et al.  Savant Genome Browser 2: visualization and analysis for population-scale genomics , 2012, Nucleic Acids Res..

[12]  Benjamin J. Raphael,et al.  An integrative probabilistic model for identification of structural variation in sequencing data , 2012, Genome Biology.

[13]  Fangqing Zhao,et al.  inGAP-sv: a novel scheme to identify and visualize structural variation from paired end mapping data , 2011, Nucleic Acids Res..

[14]  Arcadi Navarro,et al.  Gorilla genome structural variation reveals evolutionary parallelisms with chimpanzee. , 2011, Genome research.

[15]  Bradley P. Coe,et al.  Genome structural variation discovery and genotyping , 2011, Nature Reviews Genetics.

[16]  Yiping Shen,et al.  Next-generation sequencing strategies enable routine detection of balanced chromosome rearrangements for clinical diagnostics and genetic research. , 2011, American journal of human genetics.

[17]  Andrew C. Adey,et al.  Haplotype-resolved genome sequencing of a Gujarati Indian individual , 2011, Nature Biotechnology.

[18]  Kenny Q. Ye,et al.  Mapping copy number variation by population scale genome sequencing , 2010, Nature.

[19]  Andrew C. Adey,et al.  Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition , 2010, Genome Biology.

[20]  David C. Schwartz,et al.  A large, complex structural polymorphism at 16p12.1 underlies microdeletion disease risk , 2010, Nature Genetics.

[21]  Tomas W. Fitzgerald,et al.  Origins and functional impact of copy number variation in the human genome , 2010, Nature.

[22]  Ira M. Hall,et al.  Genome-wide mapping and assembly of structural variant breakpoints in the mouse genome. , 2010, Genome research.

[23]  C. Amemiya,et al.  Development and analysis of a germline BAC resource for the sea lamprey, a vertebrate that undergoes substantial chromatin diminution , 2010, Chromosoma.

[24]  Yong-shu He,et al.  [Structural variation in the human genome]. , 2009, Yi chuan = Hereditas.

[25]  Paul Medvedev,et al.  Computational methods for discovering structural variation with next-generation sequencing , 2009, Nature Methods.

[26]  Kenny Q. Ye,et al.  Sensitive and accurate detection of copy number variants using read depth of coverage. , 2009, Genome research.

[27]  J. Kitzman,et al.  Personalized Copy-Number and Segmental Duplication Maps using Next-Generation Sequencing , 2009, Nature Genetics.

[28]  Kai Ye,et al.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads , 2009, Bioinform..

[29]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[30]  Süleyman Cenk Sahinalp,et al.  Combinatorial Algorithms for Structural Variation Detection in High Throughput Sequenced Genomes , 2009, RECOMB.

[31]  Zhaoshi Jiang,et al.  Characterization of six human disease-associated inversion polymorphisms , 2009, Human molecular genetics.

[32]  Zhaoshi Jiang,et al.  Evolutionary toggling of the MAPT 17q21.31 inversion region , 2008, Nature Genetics.

[33]  Joshua M. Korn,et al.  Mapping and sequencing of structural variation from eight human genomes , 2008, Nature.

[34]  Mauro Brunato,et al.  On Effectively Finding Maximal Quasi-cliques in Graphs , 2008, LION.

[35]  Philip M. Kim,et al.  Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome , 2007, Science.

[36]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[37]  R. Pfundt,et al.  A new chromosome 17q21.31 microdeletion syndrome associated with a common inversion polymorphism , 2006, Nature Genetics.

[38]  Pardis C Sabeti,et al.  Common deletion polymorphisms in the human genome , 2006, Nature Genetics.

[39]  N. Niikawa,et al.  Non-hotspot-related breakpoints of common deletions in Sotos syndrome are located within destabilised DNA regions , 2005, Journal of Medical Genetics.

[40]  E. Eichler,et al.  Fine-scale structural variation of the human genome , 2005, Nature Genetics.

[41]  H. Stefánsson,et al.  A common inversion under selection in Europeans , 2005, Nature Genetics.

[42]  L. Feuk,et al.  Detection of large-scale variation in the human genome , 2004, Nature Genetics.

[43]  Kenny Q. Ye,et al.  Large-Scale Copy Number Polymorphism in the Human Genome , 2004, Science.

[44]  Xavier Estivill,et al.  Genomic inversions of human chromosome 15q11-q13 in mothers of Angelman syndrome patients with class II (BP2/3) deletions. , 2003, Human molecular genetics.

[45]  Stephen W. Scherer,et al.  A 1.5 million–base pair inversion polymorphism in families with Williams-Beuren syndrome , 2001, Nature Genetics.

[46]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.