MAGERI: Computational pipeline for molecular-barcoded targeted resequencing

Unique molecular identifiers (UMIs) show outstanding performance in targeted high-throughput resequencing, being the most promising approach for the accurate identification of rare variants in complex DNA samples. This approach has application in multiple areas, including cancer diagnostics, thus demanding dedicated software and algorithms. Here we introduce MAGERI, a computational pipeline that efficiently handles all caveats of UMI-based analysis to obtain high-fidelity mutation profiles and call ultra-rare variants. Using an extensive set of benchmark datasets including gold-standard biological samples with known variant frequencies, cell-free DNA from tumor patient blood samples and publicly available UMI-encoded datasets we demonstrate that our method is both robust and efficient in calling rare variants. The versatility of our software is supported by accurate results obtained for both tumor DNA and viral RNA samples in datasets prepared using three different UMI-based protocols.

[1]  James A. Casbon,et al.  A method for counting PCR template molecules with application to next-generation sequencing , 2011, Nucleic acids research.

[2]  P. Bernard,et al.  Integrated amplification and detection of the C677T point mutation in the methylenetetrahydrofolate reductase gene by fluorescence resonance energy transfer and probe melting curves. , 1998, Analytical biochemistry.

[3]  A. Sivachenko,et al.  Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples , 2013, Nature Biotechnology.

[4]  Pablo Cingolani,et al.  © 2012 Landes Bioscience. Do not distribute. , 2022 .

[5]  Hairong Duan,et al.  Benefits and Challenges with Applying Unique Molecular Identifiers in Next Generation Sequencing to Detect Low Frequency Mutations , 2016, PloS one.

[6]  S. Goodman,et al.  Analysis of mutations in DNA isolated from plasma and stool of colorectal cancer patients. , 2008, Gastroenterology.

[7]  Claudia Stewart,et al.  Analysis of 454 sequencing error rate, error sources, and artifact recombination for detection of Low-frequency drug resistance mutations in HIV-1 DNA , 2013, Retrovirology.

[8]  Mikhail Shugay,et al.  Quantitative Profiling of Immune Repertoires for Minor Lymphocyte Counts Using Unique Molecular Identifiers , 2015, The Journal of Immunology.

[9]  M. Lynch,et al.  Large-scale detection of in vivo transcription errors , 2013, Proceedings of the National Academy of Sciences.

[10]  Frank Diehl,et al.  Detection and quantification of mutations in the plasma of patients with colorectal tumors. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Dennis R. Burton,et al.  Toward a more accurate view of human B-cell repertoire by next-generation sequencing, unbiased repertoire capture and single-molecule barcoding , 2014, Scientific Reports.

[12]  R. Chiu,et al.  Quantitative analysis of circulating mitochondrial DNA in plasma. , 2003, Clinical chemistry.

[13]  P. Dominguez,et al.  Wild-type blocking polymerase chain reaction for detection of single nucleotide minority mutations from clinical specimens , 2005, Oncogene.

[14]  K. Theys,et al.  HIV-1 genotypic drug resistance testing: digging deep, reaching wide? , 2015, Current opinion in virology.

[15]  Konstantin A Lukyanov,et al.  Near-infrared fluorescent proteins , 2010, Nature Methods.

[16]  O. Britanova,et al.  Amplification of cDNA ends based on template-switching effect and step-out PCR. , 1999, Nucleic acids research.

[17]  Stephen R. Quake,et al.  Genetic measurement of memory B-cell recall using antibody repertoire sequencing , 2013, Proceedings of the National Academy of Sciences.

[18]  Ash A. Alizadeh,et al.  An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage , 2013, Nature Medicine.

[19]  Faramarz Valafar,et al.  Detection of Low-Level Mixed-Population Drug Resistance in Mycobacterium tuberculosis Using High Fidelity Amplicon Sequencing , 2015, PloS one.

[20]  S. Linnarsson,et al.  Counting absolute numbers of molecules using unique molecular identifiers , 2011, Nature Methods.

[21]  Claus V. Hallwirth,et al.  Impact of next-generation sequencing error on analysis of barcoded plasmid libraries of known complexity and sequence , 2014, Nucleic acids research.

[22]  C. Swanton,et al.  The evolution of the unstable cancer genome. , 2014, Current opinion in genetics & development.

[23]  K. Kinzler,et al.  Detection and quantification of rare mutations with massively parallel sequencing , 2011, Proceedings of the National Academy of Sciences.

[24]  A. Oudenaarden,et al.  Validation of noise models for single-cell transcriptomics , 2014, Nature Methods.

[25]  Jeffrey E. Barrick,et al.  Genome dynamics during experimental evolution , 2013, Nature Reviews Genetics.

[26]  S. Sommer,et al.  Analysis of Cancer Mutation Signatures in Blood by a Novel Ultra-Sensitive Assay: Monitoring of Therapy or Recurrence in Non-Metastatic Breast Cancer , 2009, PloS one.

[27]  Bert Vogelstein,et al.  DETECTION OF CIRCULATING TUMOR DNA IN EARLY AND LATE STAGE HUMAN MALIGNANCIES , 2014 .

[28]  Lawrence D True,et al.  Sequencing small genomic targets with high efficiency and extreme accuracy , 2015, Nature Methods.

[29]  Christopher A. Miller,et al.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. , 2012, Genome research.

[30]  P. Mieczkowski,et al.  Primer ID Validates Template Sampling Depth and Greatly Reduces the Error Rate of Next-Generation Sequencing of HIV-1 Genomic RNA Populations , 2015, Journal of Virology.

[31]  Yusuke Sato,et al.  HapMuC: somatic mutation calling using heterozygous germ line variants near candidate mutations , 2014, Bioinform..

[32]  Olivier Harismendy,et al.  Detection of low prevalence somatic mutations in solid tumors with ultra-deep targeted sequencing , 2011, Genome Biology.

[33]  Zhen Xuan Yeo,et al.  Improving Indel Detection Specificity of the Ion Torrent PGM Benchtop Sequencer , 2012, PloS one.

[34]  M. Fleischhacker,et al.  Circulating nucleic acids (CNAs) and cancer--a survey. , 2007, Biochimica et biophysica acta.

[35]  P. J. van der Zaag,et al.  Using a priori knowledge to align sequencing reads to their exact genomic position , 2011, Nucleic acids research.

[36]  R. Strausberg,et al.  Circulating tumor DNA as an early marker of therapeutic response in patients with metastatic colorectal cancer. , 2015, Annals of oncology : official journal of the European Society for Medical Oncology.

[37]  Umer Zeeshan Ijaz,et al.  Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data , 2016, BMC Bioinformatics.

[38]  H. C. Fan,et al.  Noninvasive diagnosis of fetal aneuploidy by shotgun sequencing DNA from maternal blood , 2008, Proceedings of the National Academy of Sciences.

[39]  Jesse J. Salk,et al.  Detection of ultra-rare mutations by next-generation sequencing , 2012, Proceedings of the National Academy of Sciences.

[40]  Olga V. Britanova,et al.  Age-Related Decrease in TCR Repertoire Diversity Measured with Deep and Normalized Sequence Profiling , 2014, The Journal of Immunology.

[41]  Mikhail Shugay,et al.  Towards error-free profiling of immune repertoires , 2014, Nature Methods.

[42]  J. Clemente,et al.  The Long-Term Stability of the Human Gut Microbiota , 2013 .

[43]  A. Richardson,et al.  BEAMing Up Personalized Medicine: Mutation Detection in Blood , 2012, Clinical Cancer Research.

[44]  S. Ariyan,et al.  Incidence of the V600K mutation among melanoma patients with BRAF mutations, and potential therapeutic response to the specific BRAF inhibitor PLX4032 , 2010, Journal of Translational Medicine.

[45]  M. Olivier,et al.  Identification of Circulating Tumor DNA for the Early Detection of Small-cell Lung Cancer , 2016, EBioMedicine.

[46]  Sabine Tejpar,et al.  Clinical Validation of Targeted Next Generation Sequencing for Colon and Lung Cancers , 2015, PloS one.