A new approach for detecting low-level mutations in next-generation sequence data

We propose a new method that incorporates population re-sequencing data, distribution of reads, and strand bias in detecting low-level mutations. The method can accurately identify low-level mutations down to a level of 2.3%, with an average coverage of 500×, and with a false discovery rate of less than 1%. In addition, we also discuss other problems in detecting low-level mutations, including chimeric reads and sample cross-contamination, and provide possible solutions to them.

[1]  Juliane C. Dohm,et al.  Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems , 2011, Genome Biology.

[2]  Juliane C. Dohm,et al.  Substantial biases in ultra-short read data sets from high-throughput DNA sequencing , 2008, Nucleic acids research.

[3]  K. Kinzler,et al.  Detection and quantification of rare mutations with massively parallel sequencing , 2011, Proceedings of the National Academy of Sciences.

[4]  H. Hakonarson,et al.  SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data , 2011, Nucleic acids research.

[5]  Mark Stoneking,et al.  Detecting heteroplasmy from high-throughput sequencing of complete human mitochondrial DNA genomes. , 2010, American journal of human genetics.

[6]  Joshua S. Paul,et al.  Genotype and SNP calling from next-generation sequencing data , 2011, Nature Reviews Genetics.

[7]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[8]  Life Technologies,et al.  A map of human genome variation from population-scale sequencing , 2011 .

[9]  Francesco Vallania,et al.  High-throughput discovery of rare insertions and deletions in large cohorts. , 2010, Genome research.

[10]  D. Dressman,et al.  Heteroplasmic mitochondrial DNA mutations in normal and tumor cells , 2010, Nature.

[11]  Vikas Bansal,et al.  A statistical method for the detection of variants from next-generation resequencing of DNA pools , 2010, Bioinform..

[12]  Justin C. Fay,et al.  Quantification of rare allelic variants from pooled genomic DNA , 2009, Nature Methods.

[13]  Timothy B. Stockwell,et al.  Evaluation of next generation sequencing platforms for population targeted sequencing studies , 2009, Genome Biology.

[14]  Margaret C. Linak,et al.  Sequence-specific error profile of Illumina sequencers , 2011, Nucleic acids research.

[15]  Nancy R. Zhang,et al.  Ultrasensitive detection of rare mutations using next-generation targeted resequencing , 2011, Nucleic acids research.

[16]  Matthew J. Huentelman,et al.  IDENTIFICATION OF GENETIC VARIANTS USING BARCODED MULTIPLEXED SEQUENCING , 2008, Nature Methods.

[17]  Olivier Harismendy,et al.  Accurate detection and genotyping of SNPs utilizing population sequencing data. , 2010, Genome research.

[18]  Martin Kircher,et al.  Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform , 2011, Nucleic acids research.

[19]  Manfred Kayser,et al.  Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation , 2009, Human mutation.

[20]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[21]  Martin Kircher,et al.  High‐throughput DNA sequencing – concepts and limitations , 2010, BioEssays : news and reviews in molecular, cellular and developmental biology.

[22]  Robert D Schnabel,et al.  SNP discovery and allele frequency estimation by deep sequencing of reduced representation libraries , 2008, Nature Methods.

[23]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[24]  Lior Pachter,et al.  RESEARCH ARTICLE Open Access Identification and correction of systematic error in high-throughput sequence data , 2022 .

[25]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[26]  Martin Kircher,et al.  Improved base calling for the Illumina Genome Analyzer using machine learning strategies , 2009, Genome Biology.

[27]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[28]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[29]  Christophe Dessimoz,et al.  Base-calling for next-generation sequencing platforms , 2011, Briefings Bioinform..

[30]  Anton Nekrutenko,et al.  Dynamics of mitochondrial heteroplasmy in three families investigated via a repeatable re-sequencing study , 2011, Genome Biology.

[31]  Ken Chen,et al.  VarScan: variant detection in massively parallel sequencing of individual and pooled samples , 2009, Bioinform..