Bioinformatic strategies for the analysis of genomic aberrations detected by targeted NGS panels with clinical application

Molecular profiling of tumor samples has acquired importance in cancer research, but currently also plays an important role in the clinical management of cancer patients. Rapid identification of genomic aberrations improves diagnosis, prognosis and effective therapy selection. This can be attributed mainly to the development of next-generation sequencing (NGS) methods, especially targeted DNA panels. Such panels enable a relatively inexpensive and rapid analysis of various aberrations with clinical impact specific to particular diagnoses. In this review, we discuss the experimental approaches and bioinformatic strategies available for the development of an NGS panel for a reliable analysis of selected biomarkers. Compliance with defined analytical steps is crucial to ensure accurate and reproducible results. In addition, a careful validation procedure has to be performed before the application of NGS targeted assays in routine clinical practice. With more focus on bioinformatics, we emphasize the need for thorough pipeline validation and management in relation to the particular experimental setting as an integral part of the NGS method establishment. A robust and reproducible bioinformatic analysis running on powerful machines is essential for proper detection of genomic variants in clinical settings since distinguishing between experimental noise and real biological variants is fundamental. This review summarizes state-of-the-art bioinformatic solutions for careful detection of the SNV/Indels and CNVs for targeted sequencing resulting in translation of sequencing data into clinically relevant information. Finally, we share our experience with the development of a custom targeted NGS panel for an integrated analysis of biomarkers in lymphoproliferative disorders.

[1]  Robert Kridel,et al.  Integration of gene mutations in risk prognostication for patients receiving first-line immunochemotherapy for follicular lymphoma: a retrospective analysis of a prospective clinical trial and validation in a population-based registry. , 2015, The Lancet. Oncology.

[2]  N. Lindeman,et al.  The relative utilities of genome-wide, gene panel, and individual gene sequencing in clinical practice. , 2017, Blood.

[3]  D. Neuberg,et al.  Acute myeloid leukemia ontogeny is defined by distinct somatic mutations. , 2015, Blood.

[4]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[5]  Vanessa Sochat,et al.  Singularity: Scientific containers for mobility of compute , 2017, PloS one.

[6]  European Society of Human Genetics , 2004, Humangenetik.

[7]  Contribution of BRCA1 germ-line mutations to breast cancer in Greece: a hospital-based study of 987 unselected breast cancer cases , 2009, British Journal of Cancer.

[8]  M. Gerstung,et al.  Reliable detection of subclonal single-nucleotide variants in tumour cell populations , 2012, Nature Communications.

[9]  Ira W. Deveson,et al.  Reference standards for next-generation sequencing , 2017, Nature Reviews Genetics.

[10]  Heikki Joensuu,et al.  Comparison of solution-based exome capture methods for next generation sequencing , 2011, Genome Biology.

[11]  M. Salto‐Tellez,et al.  A novel next generation sequencing approach to improve sarcoma diagnosis , 2020, Modern Pathology.

[12]  Matthew Ruffalo,et al.  Comparative analysis of algorithms for next-generation sequencing read alignment , 2011, Bioinform..

[13]  Alexa B. R. McIntyre,et al.  Extensive sequencing of seven human genomes to characterize benchmark reference materials , 2015, Scientific Data.

[14]  R. Wilson,et al.  Modernizing Reference Genome Assemblies , 2011, PLoS biology.

[15]  I. Nookaew,et al.  Evaluation and assessment of read-mapping by multiple next-generation sequencing aligners based on genome-wide characteristics. , 2017, Genomics.

[16]  A. López-Guillermo,et al.  Clinical impact of clonal and subclonal TP53, SF3B1, BIRC3, NOTCH1, and ATM mutations in chronic lymphocytic leukemia. , 2015, Blood.

[17]  Jeffrey B. Joy,et al.  Theoretical and experimental assessment of degenerate primer tagging in ultra-deep applications of next-generation sequencing , 2014, Nucleic acids research.

[18]  J. Choi,et al.  Clinical utility of targeted NGS panel with comprehensive bioinformatics analysis for patients with acute lymphoblastic leukemia , 2019, Leukemia & lymphoma.

[19]  Tatiana Popova,et al.  Multi-factor data normalization enables the detection of copy number aberrations in amplicon sequencing data , 2014, Bioinform..

[20]  K. Bertels,et al.  GPU accelerated sequence alignment with traceback for GATK HaplotypeCaller , 2019, BMC Genomics.

[21]  R. Wilson,et al.  The Next-Generation Sequencing Revolution and Its Impact on Genomics , 2013, Cell.

[22]  Eric Talevich,et al.  CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing , 2016, PLoS Comput. Biol..

[23]  C. Melief,et al.  Cancer immunology. , 2011, Current opinion in immunology.

[24]  E. Birney,et al.  Challenges and standards in integrating surveys of structural variation , 2007, Nature Genetics.

[25]  Ken Chen,et al.  SomaticSniper: identification of somatic point mutations in whole genome sequencing data , 2012, Bioinform..

[26]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration , 2012, Briefings Bioinform..

[27]  Pablo Cingolani,et al.  © 2012 Landes Bioscience. Do not distribute. , 2022 .

[28]  J. Hernández-Rivas,et al.  Chronic lymphocytic leukemia: a clinical and molecular heterogenous disease. , 2013, Cancer genetics.

[29]  Nicola D. Roberts,et al.  Genomic Classification and Prognosis in Acute Myeloid Leukemia. , 2016, The New England journal of medicine.

[30]  Gabor T. Marth,et al.  Haplotype-based variant detection from short-read sequencing , 2012, 1207.3907.

[31]  R. Durbin,et al.  Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly , 2016, bioRxiv.

[32]  John M S Bartlett,et al.  ISOWN: accurate somatic mutation identification in the absence of normal tissue controls , 2017, Genome Medicine.

[33]  M. Nikiforova,et al.  Analytical performance of the ThyroSeq v3 genomic classifier for cancer diagnosis in thyroid nodules , 2018, Cancer.

[34]  Shashikant Kulkarni,et al.  Good laboratory practice for clinical next-generation sequencing informatics pipelines , 2015, Nature Biotechnology.

[35]  Borut Peterlin,et al.  Recommendations for reporting results of diagnostic genetic testing (biochemical, cytogenetic and molecular genetic) , 2013, European Journal of Human Genetics.

[36]  Alessandro Testori,et al.  The role of BRAF V600 mutation in melanoma , 2012, Journal of Translational Medicine.

[37]  Charles Girardot,et al.  Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers , 2016, BMC Bioinformatics.

[38]  Joshua M. Stuart,et al.  Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection , 2015, Nature Methods.

[39]  Joshua F. McMichael,et al.  Whole Genome Analysis Informs Breast Cancer Response to Aromatase Inhibition , 2012, Nature.

[40]  Johan T den Dunnen,et al.  Improving sequence variant descriptions in mutation databases and literature using the Mutalyzer sequence variation nomenclature checker , 2008, Human mutation.

[41]  Paolo Rocco,et al.  Good laboratory practice for clinical next-generation sequencing informatics pipelines , 2015 .

[42]  Yuan Ji,et al.  Ultra-fast local-haplotype variant calling using paired-end DNA-sequencing data reveals somatic mosaicism in tumor and normal blood samples , 2015, Nucleic acids research.

[43]  O. Hofmann,et al.  VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research , 2016, Nucleic acids research.

[44]  Thomas P. Curtis,et al.  Modelling Computational Resources for Next Generation Sequencing Bioinformatics Analysis of 16S rRNA Samples , 2015, ArXiv.

[45]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[46]  Thierry Lecroq,et al.  UMI-VarCal: a new UMI-based variant caller that efficiently improves low-frequency variant detection in paired-end sequencing NGS libraries. , 2020, Bioinformatics.

[47]  Michael K. Slevin,et al.  Unique, dual-indexed sequencing adapters with UMIs effectively eliminate index cross-talk and significantly improve sensitivity of massively parallel sequencing , 2018, BMC Genomics.

[48]  Bin Zhu,et al.  Comparing the performance of selected variant callers using synthetic data and genome segmentation , 2018, BMC Bioinformatics.

[49]  Carme Camps,et al.  Clinical applicability and cost of a 46-gene panel for genomic analysis of solid tumours: Retrospective validation and prospective audit in the UK National Health Service , 2017, PLoS medicine.

[50]  Paolo Di Tommaso,et al.  Nextflow enables reproducible computational workflows , 2017, Nature Biotechnology.

[51]  Jia Gu,et al.  fastp: an ultra-fast all-in-one FASTQ preprocessor , 2018, bioRxiv.

[52]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[53]  R. Gascoyne,et al.  Assessment of Capture and Amplicon-Based Approaches for the Development of a Targeted Next-Generation Sequencing Pipeline to Personalize Lymphoma Management. , 2018, The Journal of molecular diagnostics : JMD.

[54]  João M. P. Cardoso,et al.  Architecture of Computing Systems – ARCS 2016 , 2016, Lecture Notes in Computer Science.

[55]  Cancer Genome Interpreter annotates the biological and clinical relevance of tumor alterations , 2017 .

[56]  F. Cunningham,et al.  The Ensembl Variant Effect Predictor , 2016, Genome Biology.

[57]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[58]  L. Terracciano,et al.  Diagnostic Targeted Sequencing Panel for Hepatocellular Carcinoma Genomic Screening. , 2018, The Journal of molecular diagnostics : JMD.

[59]  Gerton Lunter,et al.  A unified haplotype-based method for accurate and comprehensive variant calling , 2018, Nature Biotechnology.

[60]  Martin A. Nowak,et al.  Mutations driving CLL and their evolution in progression and relapse , 2015, Nature.

[61]  Fangqing Zhao,et al.  inGAP: an integrated next-generation genome analysis pipeline , 2009, Bioinform..

[62]  Christopher A. Miller,et al.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. , 2012, Genome research.

[63]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[64]  Simion I. Chiosea,et al.  Performance of a Multigene Genomic Classifier in Thyroid Nodules With Indeterminate Cytology , 2018, JAMA oncology.

[65]  C. Ponting,et al.  Sequencing depth and coverage: key considerations in genomic analyses , 2014, Nature Reviews Genetics.

[66]  Alexis B. Carter,et al.  Standards and Guidelines for Validating Next-Generation Sequencing Bioinformatics Pipelines: A Joint Recommendation of the Association for Molecular Pathology and the College of American Pathologists. , 2018, The Journal of molecular diagnostics : JMD.

[67]  Qingguo Wang,et al.  Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives , 2013, BMC Bioinformatics.

[68]  Sven Rahmann,et al.  Genome analysis , 2022 .

[69]  Kristian Cibulskis,et al.  Calling Somatic SNVs and Indels with Mutect2 , 2019, bioRxiv.

[70]  Carlos Caldas,et al.  Intersect-then-combine approach: improving the performance of somatic variant calling in whole exome sequencing data using multiple aligners and callers , 2017, Genome Medicine.

[71]  Martin L. Miller,et al.  Mutational landscape determines sensitivity to PD-1 blockade in non–small cell lung cancer , 2015, Science.

[72]  Da-ping Yu,et al.  Copy number variation in plasma as a tool for lung cancer prediction using Extreme Gradient Boosting (XGBoost) classifier , 2019, Thoracic cancer.

[73]  Renan Valieris,et al.  Bioconda: sustainable and comprehensive software distribution for the life sciences , 2018, Nature Methods.

[74]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[75]  Cassandra B. Jabara,et al.  Accurate sampling and deep sequencing of the HIV-1 protease gene using a Primer ID , 2011, Proceedings of the National Academy of Sciences.

[76]  P. A. Futreal,et al.  MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data , 2016, Genome Biology.

[77]  A. Sivachenko,et al.  Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples , 2013, Nature Biotechnology.

[78]  C. Copie-Bergman,et al.  Next-Generation Sequencing in Diffuse Large B-Cell Lymphoma Highlights Molecular Divergence and Therapeutic Opportunities: a LYSA Study , 2016, Clinical Cancer Research.

[79]  A. Heger,et al.  UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy , 2016, bioRxiv.

[80]  Bernard J. Pope,et al.  Bpipe: a tool for running and managing bioinformatics pipelines , 2012, Bioinform..

[81]  Sven Nahnsen,et al.  The nf-core framework for community-curated bioinformatics pipelines , 2020, Nature Biotechnology.

[82]  Eivind Hovig,et al.  Performance comparison of four exome capture systems for deep sequencing , 2014, BMC Genomics.

[83]  Theresa Zhang,et al.  Personalized genomic analyses for cancer mutation discovery and interpretation , 2015, Science Translational Medicine.

[84]  T. Buhl,et al.  Analysis of tumor mutational burden: correlation of five large gene panels with whole exome sequencing , 2020, Scientific Reports.

[85]  R. Durbin,et al.  Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly , 2016, bioRxiv.

[86]  Jia Gu,et al.  Gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data , 2019, BMC Bioinformatics.

[87]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[88]  Sepp Hochreiter,et al.  panelcn.MOPS: Copy‐number detection in targeted NGS panel data for clinical diagnostics , 2017, Human mutation.

[89]  Christopher T. Saunders,et al.  Strelka2: fast and accurate calling of germline and somatic variants , 2018, Nature Methods.

[90]  Peiyong Guan,et al.  Structural variation detection using next-generation sequencing data: A comparative technical review. , 2016, Methods.

[91]  Miquel A. Senar,et al.  On the Performance of BWA on NUMA Architectures , 2015, 2015 IEEE Trustcom/BigDataSE/ISPA.

[92]  Graham Pullan,et al.  BarraCUDA - a fast short read sequence aligner using graphics processing units , 2011, BMC Research Notes.

[93]  David Jones,et al.  cgpCaVEManWrapper: Simple Execution of CaVEMan in Order to Detect Somatic Single Nucleotide Variants in NGS Data , 2016, Current protocols in bioinformatics.

[94]  P. Abrisqueta,et al.  Cytogenetic complexity in chronic lymphocytic leukemia: definitions, associations, and clinical impact. , 2019, Blood.

[95]  A. Wilm,et al.  LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets , 2012, Nucleic acids research.

[96]  F. Mignone,et al.  A comprehensive BRCA1/2 NGS pipeline for an immediate Copy Number Variation (CNV) detection in breast and ovarian cancer molecular diagnosis. , 2018, Clinica chimica acta; international journal of clinical chemistry.

[97]  J. Aster,et al.  Validation and Implementation of a Custom Next-Generation Sequencing Clinical Assay for Hematologic Malignancies. , 2016, The Journal of molecular diagnostics : JMD.

[98]  H. Kume,et al.  An empirical Bayesian framework for somatic mutation detection from cancer genome sequencing data , 2013, Nucleic acids research.

[99]  J. Lupski Structural variation mutagenesis of the human genome: Impact on disease and evolution , 2015, Environmental and molecular mutagenesis.

[100]  Marina N Nikiforova,et al.  Guidelines for Validation of Next-Generation Sequencing-Based Oncology Panels: A Joint Consensus Recommendation of the Association for Molecular Pathology and College of American Pathologists. , 2017, The Journal of molecular diagnostics : JMD.

[101]  Chang Xu,et al.  A review of somatic single nucleotide variant calling algorithms for next-generation sequencing data , 2018, Computational and structural biotechnology journal.

[102]  Jorge Luis Rodriguez,et al.  The Open Science Grid , 2005 .

[103]  Thomas Colthurst,et al.  A universal SNP and small-indel variant caller using deep neural networks , 2018, Nature Biotechnology.

[104]  Martin Goodson,et al.  Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. , 2011, Genome research.

[105]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[106]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[107]  Faraz Hach,et al.  Alignment-free clustering of UMI tagged DNA molecules , 2018, Bioinform..

[108]  Yossi Farjoun,et al.  Characterization and remediation of sample index swaps by non-redundant dual indexing on massively parallel sequencing platforms , 2017, BMC Genomics.

[109]  Hongbin Zhong,et al.  Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers , 2019, Scientific Reports.

[110]  N. Navin,et al.  Clonal Evolution in Breast Cancer Revealed by Single Nucleus Genome Sequencing , 2014, Nature.

[111]  Nicholas W. Wood,et al.  A robust model for read count data in exome sequencing experiments and implications for copy number variant calling , 2012, Bioinform..

[112]  K. Zoi,et al.  Molecular pathogenesis of atypical CML, CMML and MDS/MPN-unclassifiable , 2015, International Journal of Hematology.

[113]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[114]  Joel Gelernter,et al.  Variant Callers for Next-Generation Sequencing Data: A Comparison Study , 2013, PloS one.

[115]  L. Forétova,et al.  Validation of CZECANCA (CZEch CAncer paNel for Clinical Application) for targeted NGS-based analysis of hereditary cancer syndromes , 2018, PloS one.

[116]  R. Foà,et al.  Diffuse large B-cell lymphoma genotyping on the liquid biopsy. , 2017, Blood.

[117]  K. Stamatopoulos,et al.  Genetics and Prognostication in Splenic Marginal Zone Lymphoma: Revelations from Deep Sequencing , 2015, Clinical Cancer Research.

[118]  G. McVean,et al.  Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications , 2014, Nature Genetics.

[119]  Findlay Bewicke-Copley,et al.  Applications and analysis of targeted genomic sequencing in cancer studies , 2019, Computational and structural biotechnology journal.

[120]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[121]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[122]  Yusuke Sato,et al.  HapMuC: somatic mutation calling using heterozygous germ line variants near candidate mutations , 2014, Bioinform..

[123]  Olaf Neumann,et al.  Implementing tumor mutational burden (TMB) analysis in routine diagnostics-a primer for molecular pathologists and clinicians. , 2018, Translational lung cancer research.

[124]  Marilyn M. Li,et al.  Standards and Guidelines for the Interpretation and Reporting of Sequence Variants in Cancer: A Joint Consensus Recommendation of the Association for Molecular Pathology, American Society of Clinical Oncology, and College of American Pathologists. , 2017, The Journal of molecular diagnostics : JMD.

[125]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[126]  Zaid Al-Ars,et al.  GASAL2: a GPU accelerated sequence alignment library for high-throughput NGS data , 2019, BMC Bioinformatics.

[127]  K. Mertz,et al.  Single-Center Experience with a Targeted Next Generation Sequencing Assay for Assessment of Relevant Somatic Alterations in Solid Tumors , 2017, Neoplasia.

[128]  Á. Carracedo,et al.  Evaluating the Calling Performance of a Rare Disease NGS Panel for Single Nucleotide and Copy Number Variants , 2017, Molecular Diagnosis & Therapy.

[129]  Yuan Qi,et al.  Clinical actionability enhanced through deep targeted sequencing of solid tumors. , 2015, Clinical chemistry.

[130]  Eric Samorodnitsky,et al.  Evaluation of Hybridization Capture Versus Amplicon‐Based Methods for Whole‐Exome Sequencing , 2015, Human mutation.