Evaluation on Efficient Detection of Structural Variants at Low Coverage by Long-Read Sequencing

Structural variants (SVs) in human genome are implicated in a variety of human diseases. Long-read sequencing (such as those from PacBio) delivers much longer read lengths than short-read sequencing (such as those from Illumina) and may greatly improve SV detection. However, due to the relatively high cost of long-read sequencing, users are often faced with issues such as what coverage is needed and how to optimally use the aligners and SV callers. Here, we evaluated SV calling performance of three SV calling algorithms (PBHoney-Tails, PBHoney-Spots and Sniffles) under different PacBio coverages on two personal genomes, NA12878 and HX1. Our results showed that, at 10X coverage, 76% ~ 84% deletions and 80% ~ 92 % insertions in the gold standard set can be detected by PBHoney-Spots. Combining both PBHoney-Spots and Sniffles greatly increased sensitivity, especially under lower coverages such as 6X. We further evaluated the Mendelian errors on an Ashkenazi Jewish trio dataset with low-coverage whole-genome PacBio sequencing. In addition, to automate SV calling, we developed a computational pipeline called NextSV, which integrates PBhoney and Sniffles and generates the union (high sensitivity) or intersection (high specificity) call sets. Our results provide useful guidelines for SV identification from low coverage whole-genome PacBio data and we expect that NextSV will facilitate the analysis on SVs on long-read sequencing data.

[1]  Mark J. P. Chaisson,et al.  Resolving the complexity of the human genome using single-molecule sequencing , 2014, Nature.

[2]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[3]  Modesto Orozco,et al.  Comprehensive characterization of complex structural variations in cancer by directly comparing genome sequence reads , 2014, Nature Biotechnology.

[4]  Adam C. English,et al.  PBHoney: identifying genomic variants via long-read discordance and interrupted mapping , 2014, BMC Bioinformatics.

[5]  J. Lupski,et al.  Mechanisms underlying structural variant formation in genomic disorders , 2016, Nature Reviews Genetics.

[6]  Russell E. Durrett,et al.  Assembly and diploid architecture of an individual human genome via single-molecule technologies , 2015, Nature Methods.

[7]  Glenn Tesler,et al.  Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory , 2012, BMC Bioinformatics.

[8]  J. Veltman,et al.  De novo mutations in human genetic disease , 2012, Nature Reviews Genetics.

[9]  Euan A. Ashley,et al.  Long-read whole genome sequencing identifies causal structural variation in a Mendelian disease , 2016, bioRxiv.

[10]  Alexa B. R. McIntyre,et al.  Extensive sequencing of seven human genomes to characterize benchmark reference materials , 2015, Scientific Data.

[11]  P. Stankiewicz,et al.  Structural variation in the human genome and its role in disease. , 2010, Annual review of medicine.

[12]  H. Milting,et al.  Supplemental Material , 2004 .

[13]  John Wei,et al.  Towards a comprehensive structural variation map of an individual human genome , 2010, Genome Biology.

[14]  Wolfgang Losert,et al.  svclassify: a method to establish benchmark structural variant calls , 2015, BMC Genomics.

[15]  E. Eichler,et al.  Long-read sequencing and de novo assembly of a Chinese genome , 2016, Nature Communications.

[16]  L. Feuk,et al.  Structural variation in the human genome , 2006, Nature Reviews Genetics.

[17]  Lili Ding,et al.  Using Mendelian inheritance errors as quality control criteria in whole genome sequencing data set , 2014, BMC Proceedings.

[18]  Lovelace J. Luquette,et al.  Diverse Mechanisms of Somatic Structural Variations in Human Cancer Genomes , 2013, Cell.

[19]  Jan O. Korbel,et al.  Phenotypic impact of genomic structural variation: insights from and for human disease , 2013, Nature Reviews Genetics.

[20]  C. Ponting,et al.  Sequencing depth and coverage: key considerations in genomic analyses , 2014, Nature Reviews Genetics.