npInv: accurate detection and genotyping of inversions mediated by non-allelic homologous recombination using long read sub-alignment

Detection of genomic inversions remains challenging. Many existing methods primarily target inversions with a non repetitive breakpoint, leaving inverted repeat (IR) mediated non-allelic homologous recombination (NAHR) inversions largely unexplored. We present npInv, a novel tool specifically for detecting and genotyping NAHR inversion using long read sub-alignment of long read sequencing data. We use npInv to generate a whole-genome inversion map for NA12878 consisting of 30 NAHR inversions (of which 15 are novel), including all previously known NAHR mediated inversions in NA12878 with flanking IR less than 7kb. Our genotyping accuracy on this dataset was 94%. We used PCR to confirm presence of two of these novel NAHR inversions. We show that there is a near linear relationship between the length of flanking IR and the size of the NAHR inversion.

[1]  H. Stefánsson,et al.  A common inversion under selection in Europeans , 2005, Nature Genetics.

[2]  A. Sturtevant,et al.  Genetic Factors Affecting the Strength of Linkage in Drosophila. , 1917, Proceedings of the National Academy of Sciences of the United States of America.

[3]  J. Lupski,et al.  The DNA replication FoSTeS/MMBIR mechanism can generate genomic, genic and exonic complex rearrangements in humans , 2009, Nature Genetics.

[4]  Gregory Kucherov,et al.  YASS: enhancing the sensitivity of DNA similarity search , 2005, Nucleic Acids Res..

[5]  Benjamin J. Raphael,et al.  Identification and Frequency Estimation of Inversion Polymorphisms from Haplotype Data , 2009, RECOMB.

[6]  E. Eichler,et al.  Primate segmental duplications: crucibles of evolution, diversity and disease , 2006, Nature Reviews Genetics.

[7]  M. Frith,et al.  Adaptive seeds tame genomic sequence comparison. , 2011, Genome research.

[8]  Russell E. Durrett,et al.  Assembly and diploid architecture of an individual human genome via single-molecule technologies , 2015, Nature Methods.

[9]  A population model for genotyping indels from next-generation sequence data , 2012, Nucleic acids research.

[10]  Stephen W. Scherer,et al.  A 1.5 million–base pair inversion polymorphism in families with Williams-Beuren syndrome , 2001, Nature Genetics.

[11]  Michael C. Schatz,et al.  Accurate detection of complex structural variations using single molecule sequencing , 2017 .

[12]  Gary Benson,et al.  Inverted repeat structure of the human genome: the X-chromosome contains a preponderance of large, highly homologous inverted repeats that contain testes genes. , 2004, Genome research.

[13]  Mario Cáceres,et al.  On the Power and the Systematic Biases of the Detection of Chromosomal Inversions by Paired-End Genome Sequencing , 2013, PloS one.

[14]  Benedict Paten,et al.  Improved data analysis for the MinION nanopore sequencer , 2015, Nature Methods.

[15]  Thomas Zichner,et al.  DELLY: structural variant discovery by integrated paired-end and split-read analysis , 2012, Bioinform..

[16]  S. Salzberg,et al.  Repetitive DNA and next-generation sequencing: computational challenges and solutions , 2011, Nature Reviews Genetics.

[17]  John Wei,et al.  Towards a comprehensive structural variation map of an individual human genome , 2010, Genome Biology.

[18]  Francisco M. De La Vega,et al.  Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. , 2009, Genome research.

[19]  Xavier Estivill,et al.  Genomic inversions of human chromosome 15q11-q13 in mothers of Angelman syndrome patients with class II (BP2/3) deletions. , 2003, Human molecular genetics.

[20]  Sangsoo Kim,et al.  The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group. , 2009, Genome research.

[21]  S. Koren,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, bioRxiv.

[22]  A. Quinlan BEDTools: The Swiss‐Army Tool for Genome Feature Analysis , 2014, Current protocols in bioinformatics.

[23]  Alexander Lex,et al.  UpSetR: An R Package for the Visualization of Intersecting Sets and their Properties , 2017 .

[24]  Joshua M. Korn,et al.  Mapping and sequencing of structural variation from eight human genomes , 2008, Nature.

[25]  V. Bafna,et al.  Evidence for large inversion polymorphisms in the human genome from HapMap data. , 2007, Genome research.

[26]  Ryan M. Layer,et al.  LUMPY: a probabilistic framework for structural variant discovery , 2012, Genome Biology.

[27]  Lorena Pantano,et al.  InvFEST, a database integrating information of polymorphic inversions in the human genome , 2013, Nucleic Acids Res..

[28]  Philip M. Kim,et al.  Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome , 2007, Science.

[29]  Benjamin J. Raphael,et al.  Identification of polymorphic inversions from genotypes , 2011, BMC Bioinformatics.

[30]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[31]  Gabor T. Marth,et al.  An integrated map of structural variation in 2,504 human genomes , 2015, Nature.

[32]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[33]  Dagmar Wieczorek,et al.  Heterozygous submicroscopic inversions involving olfactory receptor-gene clusters mediate the recurrent t(4;8)(p16;p23) translocation. , 2002, American journal of human genetics.

[34]  M. McVey,et al.  MMEJ repair of double-strand breaks (director's cut): deleted sequences and alternative endings. , 2008, Trends in genetics : TIG.

[35]  Dawei Li,et al.  The diploid genome sequence of an Asian individual , 2008, Nature.

[36]  Yong-shu He,et al.  [Structural variation in the human genome]. , 2009, Yi chuan = Hereditas.

[37]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.