BAMClipper: removing primers from alignments to minimize false-negative mutations in amplicon next-generation sequencing

Amplicon-based next-generation sequencing (NGS) has been widely adopted for genetic variation detection in human and other organisms. Conventional data analysis paradigm includes primer trimming before read mapping. Here we introduce BAMClipper that removes primer sequences after mapping original sequencing reads by soft-clipping SAM/BAM alignments. Mutation detection accuracy was affected by the choice of primer handling approach based on real NGS datasets of 7 human peripheral blood or breast cancer tissue samples with known BRCA1/BRCA2 mutations and >130000 simulated NGS datasets with unique mutations. BAMClipper approach detected a BRCA1 deletion (c.1620_1636del) that was otherwise missed due to edge effect. Simulation showed high false-negative rate when primers were perfectly trimmed as in conventional practice. Among the other 6 samples, variant allele frequencies of 5 BRCA1/BRCA2 mutations (indel or single-nucleotide variants) were diluted by apparently wild-type primer sequences from an overlapping amplicon (17 to 82% under-estimation). BAMClipper was robust in both situations and all 7 mutations were detected. When compared with Cutadapt, BAMClipper was faster and maintained equally high primer removal effectiveness. BAMClipper is implemented in Perl and is available under an open source MIT license at https://github.com/tommyau/bamclipper.

[1]  R. Scott,et al.  Panel Testing for Familial Breast Cancer: Calibrating the Tension Between Research and Clinical Care. , 2016, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[2]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[3]  Yufeng Shen,et al.  Next-Generation Sequencing of Pulmonary Sarcomatoid Carcinoma Reveals High Frequency of Actionable MET Gene Mutations. , 2016, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[4]  J. D. den Dunnen Sequence Variant Descriptions: HGVS Nomenclature and Mutalyzer. , 2016, Current protocols in human genetics.

[5]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[6]  E. Ma,et al.  Next-generation sequencing with a myeloid gene panel in core-binding factor AML showed KIT activation loop and TET2 mutations predictive of outcome , 2016, Blood Cancer Journal.

[7]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[8]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[9]  Edge effects in calling variants from targeted amplicon sequencing , 2014, BMC Genomics.

[10]  Chun Hang Au,et al.  Clinical evaluation of panel testing by next-generation sequencing (NGS) for gene mutations in myeloid neoplasms , 2016, Diagnostic Pathology.

[11]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[12]  R. Kaldate,et al.  Effect of BSA dosing on 5-FU exposure among colorectal cancer patients depending on their gender and age. , 2012 .

[13]  Guy Perrière,et al.  Bioinformatics developments for NGS data analysis at PRABI , 2012 .

[14]  Ayala Hubert,et al.  Olaparib monotherapy in patients with advanced cancer and a germline BRCA1/2 mutation. , 2015, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[15]  R. Bellazzi,et al.  Clinical Effects of Driver Somatic Mutations on the Outcomes of Patients With Myelodysplastic Syndromes Treated With Allogeneic Hematopoietic Stem-Cell Transplantation. , 2016, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[16]  Mingming Jia,et al.  COSMIC: exploring the world's knowledge of somatic mutations in human cancer , 2014, Nucleic Acids Res..

[17]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[18]  James M Ford,et al.  Detection of Germline Mutation in Hereditary Breast and/or Ovarian Cancers by Next-Generation Sequencing on a Four-Gene Panel. , 2016, The Journal of molecular diagnostics : JMD.

[19]  R. Kanagal-Shamanna,et al.  Advances in clinical next-generation sequencing: target enrichment and sequencing technologies , 2016, Expert review of molecular diagnostics.

[20]  E. Ma,et al.  Homoharringtonine (omacetaxine mepesuccinate) as an adjunct for FLT3-ITD acute myeloid leukemia , 2016, Science Translational Medicine.

[21]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .