HAPDeNovo: a haplotype-based approach for filtering and phasing de novo mutations in linked read sequencing data

BackgroundDe novo mutations (DNMs) are associated with neurodevelopmental and congenital diseases, and their detection can contribute to understanding disease pathogenicity. However, accurate detection is challenging because of their small number relative to the genome-wide false positives in next generation sequencing (NGS) data. Software such as DeNovoGear and TrioDeNovo have been developed to detect DNMs, but at good sensitivity they still produce many false positive calls.ResultsTo address this challenge, we develop HAPDeNovo, a program that leverages phasing information from linked read sequencing, to remove false positive DNMs from candidate lists generated by DNM-detection tools. Short reads from each phasing block are allocated to each of the two haplotypes followed by generating a haploid genotype for each putative DNM. HAPDeNovo removes variants that are called as heterozygous in one of the haplotypes because they are almost certainly false positives. Our experiments on 10X Chromium linked read sequencing trio data reveal that HAPDeNovo eliminates 80 to 99% of false positives regardless of how large the candidate DNM set is.ConclusionsHAPDeNovo leverages the haplotype information from linked read sequencing to remove spurious false positive DNMs effectively, and it increases accuracy of DNM detection dramatically without sacrificing sensitivity.

[1]  Michael Wigler,et al.  The role of de novo mutations in the genetics of autism spectrum disorders , 2014, Nature Reviews Genetics.

[2]  Wei Chen,et al.  A Likelihood-Based Framework for Variant Calling and De Novo Mutation Detection in Families , 2012, PLoS genetics.

[3]  E. Banks,et al.  De novo mutations in schizophrenia implicate synaptic networks , 2014, Nature.

[4]  Wei Chen,et al.  A computational method for genotype calling in family-based sequencing data , 2016, BMC Bioinformatics.

[5]  Daniel E. Newburger,et al.  Read clouds uncover variation in complex regions of the human genome. , 2015, Genome research.

[6]  Vineet Bafna,et al.  HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies , 2017, Genome research.

[7]  Wei Chen,et al.  Genotype calling and haplotyping in parent-offspring trios , 2013, Genome research.

[8]  Stephan J Sanders,et al.  De novo mutations in congenital heart disease with neurodevelopmental and other congenital anomalies , 2015, Science.

[9]  Wei Chen,et al.  Sequence analysis A Bayesian framework for de novo mutation calling in parents-offspring trios , 2015 .

[10]  Hanlee P. Ji,et al.  Haplotyping germline and cancer genomes using high-throughput linked-read sequencing , 2015, Nature Biotechnology.

[11]  Arthur Wuster,et al.  DeNovoGear: de novo indel and point mutation discovery and phasing , 2013, Nature Methods.

[12]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[13]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[14]  M. DePristo,et al.  Variation in genome-wide mutation rates within and between human families , 2011, Nature Genetics.

[15]  Ronald W. Davis,et al.  Rare variant detection using family-based sequencing analysis , 2013, Proceedings of the National Academy of Sciences.

[16]  Heinrich Sticht,et al.  De novo mutations in the genome organizer CTCF cause intellectual disability. , 2013, American journal of human genetics.