Low-level variant calling for non-matched samples using a position-based and nucleotide-specific approach

Background The widespread use of next-generation sequencing has identified an important role for somatic mosaicism in many diseases. However, detecting low-level mosaic variants from next-generation sequencing data remains challenging. Results Here, we present a method for Position-Based Variant Identification (PBVI) that uses empirically-derived distributions of alternate nucleotides from a control dataset. We modeled this approach on 11 segmental overgrowth genes. We show that this method improves detection of single nucleotide mosaic variants of 0.01–0.05 variant allele fraction compared to other low-level variant callers. At depths of 600 × and 1200 ×, we observed > 85% and > 95% sensitivity, respectively. In a cohort of 26 individuals with somatic overgrowth disorders PBVI showed improved signal to noise, identifying pathogenic variants in 17 individuals. Conclusion PBVI can facilitate identification of low-level mosaic variants thus increasing the utility of next-generation sequencing data for research and diagnostic purposes.

[1]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[2]  O. Ohara,et al.  Detection of Base Substitution-Type Somatic Mosaicism of the NLRP3 Gene with >99.9% Statistical Confidence by Massively Parallel Sequencing , 2012, DNA research : an international journal for rapid publication of reports on genes and genomes.

[3]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[4]  J. Eras,et al.  A fast and reliable ultrahigh-performance liquid chromatography method to assess the fate of chlorophylls in teas and processed vegetable foodstuff. , 2018, Journal of chromatography. A.

[5]  Joshua M. Stuart,et al.  Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection , 2015, Nature Methods.

[6]  O. Hofmann,et al.  VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research , 2016, Nucleic acids research.

[7]  Saskia D. Hiltemann,et al.  Discriminating somatic and germline mutations in tumor DNA samples without matching normals , 2015, Genome research.

[8]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[9]  Celine S. Hong,et al.  SomatoSim: precision simulation of somatic single nucleotide variants , 2021, BMC Bioinformatics.

[10]  Lovelace J Luquette,et al.  Detecting Somatic Mutations in Normal Cells. , 2018, Trends in genetics : TIG.

[11]  W. Klein,et al.  Knowledge, motivations, expectations, and traits of an African, African-American, and Afro-Caribbean sequencing cohort and comparisons to the original ClinSeq® cohort , 2018, Genetics in Medicine.

[12]  Shicai Wang,et al.  COSMIC: the Catalogue Of Somatic Mutations In Cancer , 2018, Nucleic Acids Res..

[13]  Jacek Majewski,et al.  LoLoPicker: detecting low allelic-fraction variants from low-quality cancer samples , 2016, bioRxiv.

[14]  Jamie K Teer,et al.  A mosaic activating mutation in AKT1 associated with the Proteus syndrome. , 2011, The New England journal of medicine.

[15]  C. Quince,et al.  Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform , 2015, Nucleic acids research.

[16]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[17]  L. Biesecker,et al.  Lack of mutation–histopathology correlation in a patient with Proteus syndrome , 2016, American journal of medical genetics. Part A.

[18]  Vikas Bansal,et al.  A statistical method for the detection of variants from next-generation resequencing of DNA pools. , 2010, Bioinformatics.

[19]  Chunlei Liu,et al.  ClinVar: improving access to variant interpretations and supporting evidence , 2017, Nucleic Acids Res..

[20]  C. Pritchard,et al.  Characterization of a severe case of PIK3CA‐related overgrowth at autopsy by droplet digital polymerase chain reaction and report of PIK3CA sequencing in 22 patients , 2018, American journal of medical genetics. Part A.

[21]  J. Stockman,et al.  A Mosaic Activating Mutation in AKT1 Associated with the Proteus Syndrome , 2013 .

[22]  Irina M. Armean,et al.  The mutational constraint spectrum quantified from variation in 141,456 humans , 2019, Nature.

[23]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[24]  Umer Zeeshan Ijaz,et al.  Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data , 2016, BMC Bioinformatics.

[25]  Ryan L. Collins,et al.  The mutational constraint spectrum quantified from variation in 141,456 humans , 2020, Nature.

[26]  J. Rivière,et al.  Megalencephaly Syndromes and Activating Mutations in the PI3K‐AKT Pathway: MPPH and MCAP , 2013, American journal of medical genetics. Part C, Seminars in medical genetics.

[27]  Sarah Sandmann,et al.  Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data , 2017, Scientific Reports.

[28]  Vikas Bansal,et al.  A statistical method for the detection of variants from next-generation resequencing of DNA pools , 2010, Bioinform..

[29]  Peter J. Campbell,et al.  Subclonal variant calling with multiple samples and prior knowledge , 2014, Bioinform..

[30]  M. Whitlock Combining probability from independent tests: the weighted Z‐method is superior to Fisher's approach , 2005, Journal of evolutionary biology.

[31]  A. Wilm,et al.  LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets , 2012, Nucleic acids research.

[32]  Chang Xu,et al.  A review of somatic single nucleotide variant calling algorithms for next-generation sequencing data , 2018, Computational and structural biotechnology journal.

[33]  Julie C. Sapp,et al.  PIK3CA‐related overgrowth spectrum (PROS): Diagnostic and testing eligibility criteria, differential diagnosis, and evaluation , 2015, American journal of medical genetics. Part A.

[34]  Mark Stoneking,et al.  A new approach for detecting low-level mutations in next-generation sequence data , 2012, Genome Biology.