LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets

The study of cell-population heterogeneity in a range of biological systems, from viruses to bacterial isolates to tumor samples, has been transformed by recent advances in sequencing throughput. While the high-coverage afforded can be used, in principle, to identify very rare variants in a population, existing ad hoc approaches frequently fail to distinguish true variants from sequencing errors. We report a method (LoFreq) that models sequencing run-specific error rates to accurately call variants occurring in <0.05% of a population. Using simulated and real datasets (viral, bacterial and human), we show that LoFreq has near-perfect specificity, with significantly improved sensitivity compared with existing methods and can efficiently analyze deep Illumina sequencing datasets without resorting to approximations or heuristics. We also present experimental validation for LoFreq on two different platforms (Fluidigm and Sequenom) and its application to call rare somatic variants from exome sequencing datasets for gastric cancer. Source code and executables for LoFreq are freely available at http://sourceforge.net/projects/lofreq/.

[1]  Niranjan Nagarajan,et al.  A Randomized, Double-Blind Placebo Controlled Trial of Balapiravir, a Polymerase Inhibitor, in Adult Dengue Patients , 2012, The Journal of infectious diseases.

[2]  N. Rosenfeld,et al.  Noninvasive Identification and Monitoring of Cancer Mutations by Targeted Deep Sequencing of Plasma DNA , 2012, Science Translational Medicine.

[3]  Bin Tean Teh,et al.  Exome sequencing of gastric adenocarcinoma identifies recurrent somatic mutations in cell adhesion and chromatin remodeling genes , 2012, Nature Genetics.

[4]  P. A. Futreal,et al.  Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. , 2012, The New England journal of medicine.

[5]  Michael C. Zody,et al.  Highly Sensitive and Specific Detection of Rare Variants in Mixed Viral Populations from Massively Parallel Sequence Data , 2012, PLoS Comput. Biol..

[6]  N. Lennon,et al.  High-Resolution Analysis of Intrahost Genetic Diversity in Dengue Virus Serotype 1 Infection Identifies Mixed Infections , 2012, Journal of Virology.

[7]  S. Scherer,et al.  Clonal Selection Drives Genetic Divergence of Metastatic Medulloblastoma , 2012, Nature.

[8]  Joshua F. McMichael,et al.  Clonal evolution in relapsed acute myeloid leukemia revealed by whole genome sequencing , 2011, Nature.

[9]  R. Arceci Intratumor Heterogeneity and Branched Evolution Revealed by Multiregion Sequencing , 2012 .

[10]  Kimihito Ito,et al.  A conserved region in the prM protein is a critical determinant in the assembly of flavivirus particles. , 2012, The Journal of general virology.

[11]  Remy Chait,et al.  Evolutionary paths to antibiotic resistance under dynamically sustained drug selection , 2011, Nature Genetics.

[12]  Olivier Harismendy,et al.  Detection of low prevalence somatic mutations in solid tumors with ultra-deep targeted sequencing , 2011, Genome Biology.

[13]  Juliane C. Dohm,et al.  Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems , 2011, Genome Biology.

[14]  Ian K. Blaby,et al.  Experimental Evolution of a Facultative Thermophile from a Mesophilic Ancestor , 2011, Applied and Environmental Microbiology.

[15]  F. Sato,et al.  Genetic Heterogeneity of Hepatitis C Virus in Association with Antiviral Therapy Determined by Ultra-Deep Sequencing , 2011, PloS one.

[16]  H. Hakonarson,et al.  SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data , 2011, Nucleic acids research.

[17]  Anton Nekrutenko,et al.  Dynamics of mitochondrial heteroplasmy in three families investigated via a repeatable re-sequencing study , 2011, Genome Biology.

[18]  Joshua S. Paul,et al.  Genotype and SNP calling from next-generation sequencing data , 2011, Nature Reviews Genetics.

[19]  C. Furusawa,et al.  Comparison of Sequence Reads Obtained from Three Next-Generation Sequencing Platforms , 2011, PloS one.

[20]  Margaret C. Linak,et al.  Sequence-specific error profile of Illumina sequencers , 2011, Nucleic acids research.

[21]  A. Chatterjee,et al.  Mitochondrial Subversion in Cancer , 2011, Cancer Prevention Research.

[22]  Nicholas Eriksson,et al.  ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data , 2011, BMC Bioinformatics.

[23]  Laurence D. Hurst,et al.  Metabolic trade-offs and the maintenance of the fittest and the flattest , 2011, Nature.

[24]  Marco J. Morelli,et al.  Beyond the Consensus: Dissecting Within-Host Viral Population Diversity of Foot-and-Mouth Disease Virus by Using Next-Generation Genome Sequencing , 2010, Journal of Virology.

[25]  F. Kronenberg,et al.  Somatic mutations throughout the entire mitochondrial genome are associated with elevated PSA levels in prostate cancer patients. , 2010, American journal of human genetics.

[26]  Francesco Vallania,et al.  High-throughput discovery of rare insertions and deletions in large cohorts. , 2010, Genome research.

[27]  K. Chumakov,et al.  Massively parallel sequencing for monitoring genetic consistency and quality control of live viral vaccines , 2010, Proceedings of the National Academy of Sciences.

[28]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[29]  J. Collins,et al.  Bacterial charity work leads to population-wide resistance , 2010, Nature.

[30]  Yoshitaka Narita,et al.  Tumor heterogeneity is an active process maintained by a mutant EGFR-induced cytokine circuit in glioblastoma. , 2010, Genes & development.

[31]  Raul Andino,et al.  Quasispecies Theory and the Behavior of RNA Viruses , 2010, PLoS pathogens.

[32]  Vikas Bansal,et al.  A statistical method for the detection of variants from next-generation resequencing of DNA pools , 2010, Bioinform..

[33]  L. Farinelli,et al.  Rhinovirus Genome Evolution during Experimental Human Infection , 2010, PloS one.

[34]  Timothy B. Stockwell,et al.  Infidelity of SARS-CoV Nsp14-Exonuclease Mutant Virus Replication Is Revealed by Complete Genome Sequencing , 2010, PLoS pathogens.

[35]  Joshua F. McMichael,et al.  Genome Remodeling in a Basal-like Breast Cancer Metastasis and Xenograft , 2010, Nature.

[36]  E. Keinan,et al.  The ND2 subunit is labeled by a photoaffinity analogue of asimicin, a potent complex I inhibitor , 2010, FEBS letters.

[37]  Jeffrey E. Barrick,et al.  Genome evolution and adaptation in a long-term experiment with Escherichia coli , 2009, Nature.

[38]  Ken Chen,et al.  VarScan: variant detection in massively parallel sequencing of individual and pooled samples , 2009, Bioinform..

[39]  K. Reinert,et al.  RazerS--fast read mapping with sensitivity control. , 2009, Genome research.

[40]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[41]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[42]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[43]  Justin C. Fay,et al.  Quantification of rare allelic variants from pooled genomic DNA , 2009, Nature Methods.

[44]  Jeffrey E. Barrick,et al.  Genome-wide mutational diversity in an evolving population of Escherichia coli. , 2009, Cold Spring Harbor symposia on quantitative biology.

[45]  Daniel H. Huson,et al.  MetaSim—A Sequencing Simulator for Genomics and Metagenomics , 2008, PloS one.

[46]  Ying Zhang,et al.  The Flavivirus Precursor Membrane-Envelope Protein Complex: Structure and Maturation , 2008, Science.

[47]  Pierre Baldi,et al.  An enhanced MITOMAP with a global mtDNA mutational phylogeny , 2006, Nucleic Acids Res..

[48]  Marty C. Brandon,et al.  Mitochondrial mutations in cancer , 2006, Oncogene.

[49]  T. Ferenci,et al.  Clonal Adaptive Radiation in a Constant Environment , 2006, Science.

[50]  R. Devos,et al.  The Novel Nucleoside Analog R1479 (4′-Azidocytidine) Is a Potent Inhibitor of NS5B-dependent RNA Synthesis and Hepatitis C Virus Replication in Cell Culture* , 2006, Journal of Biological Chemistry.

[51]  A. Gamarnik,et al.  Role of RNA structures present at the 3'UTR of dengue virus on translation, RNA synthesis, and viral replication. , 2005, Virology.

[52]  M. Eigen Selforganization of matter and the evolution of biological macromolecules , 1971, Naturwissenschaften.

[53]  A. Barrett,et al.  Genetic variation in the 3' non-coding region of dengue viruses. , 2001, Virology.

[54]  D. Turnbull,et al.  Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA , 1999, Nature Genetics.

[55]  S. Dimauro,et al.  A new mtDNA mutation in the tRNALeu(UUR) gene associated with maternally inherited cardiomyopathy , 1994, Human mutation.

[56]  R. Kapsa,et al.  A tRNA(Lys) mutation in the mtDNA is the causal genetic lesion underlying myoclonic epilepsy and ragged-red fiber (MERRF) syndrome. , 1991, American journal of human genetics.

[57]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[58]  V. Georgiev Virology , 1955, Nature.