Systematic and comprehensive benchmarking of an exome sequencing based germline copy-number analysis pipeline to detect clinically relevant CNVs

Purpose Detecting germline copy-number variants (CNVs) from exome sequencing (ES) is not a standard practice in clinical settings owing to several reasons concerning performance. We added a CNV pipeline to our clinical ES workflow and comprehensively characterized its performance. Methods We used a cohort of 387 individuals (351 probands and 36 family members) with both clinical chromosomal microarray (CMA) and ES data available to compare the CNV calls and comprehensively characterize the CNVs from ES. We excluded the exons with low mappability scores prior to variant calling to reduce the number of false positives. Reproducibility was assessed by running 1,000 iterations per sample using permuted random subsets of 200 controls. Results The ES-based CNV pipeline was 93% sensitive for all deletions and duplications as compared to the gold-standard baseline true-positive CNV calls defined by the CMA. The modified workflow resulted in a drastic reduction of the total number of CNVs identified per sample and the number of false positives while retaining a high sensitivity of 90%. The exome-based CNV pipeline was 100% sensitive for clinically-relevant, rare variants (including single exon deletions), and was highly reproducible. Conclusion Exome-based CNV detection can be reliably used in a clinical setting and can increase the overall diagnostic yield.

[1]  Jeffrey Pennington,et al.  Novel findings with reassessment of exome data: implications for validation testing and interpretation of genomic data , 2018, Genetics in Medicine.

[2]  J. Zook,et al.  Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls , 2013, Nature Biotechnology.

[3]  Xiaowu Gai,et al.  CNV Workshop: an integrated platform for high-throughput copy number variation discovery and clinical diagnostics , 2010, BMC Bioinformatics.

[4]  Nancy R. Zhang,et al.  CODEX: a normalization and copy number variation detection method for whole exome sequencing , 2015, Nucleic acids research.

[5]  Terrence S. Furey,et al.  The UCSC Genome Browser Database: update 2006 , 2005, Nucleic Acids Res..

[6]  Mary Goldman,et al.  The UCSC Genome Browser database: update 2011 , 2010, Nucleic Acids Res..

[7]  J. R. MacDonald,et al.  A Comprehensive Workflow for Read Depth-Based Identification of Copy-Number Variation from Whole-Genome Sequence Data. , 2018, American journal of human genetics.

[8]  Celine S. Hong,et al.  Assessing the reproducibility of exome copy number variations predictions , 2016, Genome Medicine.

[9]  Clara Gaff,et al.  Patient safety in genomic medicine: an exploratory study , 2016, Genetics in Medicine.

[10]  Jeffrey Pennington,et al.  Correction: Novel findings with reassessment of exome data: implications for validation testing and interpretation of genomic data , 2018, Genetics in Medicine.

[11]  L. Bierut,et al.  Copy Number Variation Accuracy in Genome-Wide Association Studies , 2011, Human Heredity.

[12]  E. Banks,et al.  Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. , 2012, American journal of human genetics.

[13]  B. Peterlin,et al.  Comprehensive use of extended exome analysis improves diagnostic yield in rare disease: a retrospective survey in 1,059 cases , 2017, Genetics in Medicine.

[14]  Andrew Collins,et al.  Exome sequence read depth methods for identifying copy number changes , 2015, Briefings Bioinform..

[15]  David Haussler,et al.  The UCSC genome browser database: update 2007 , 2006, Nucleic Acids Res..

[16]  Yiping Shen,et al.  Evaluation of three read-depth based CNV detection tools using whole-exome sequencing data , 2017, Molecular Cytogenetics.

[17]  Nicholas W. Wood,et al.  A robust model for read count data in exome sequencing experiments and implications for copy number variant calling , 2012, Bioinform..

[18]  I. Krantz,et al.  AUDIOME: a tiered exome sequencing–based comprehensive gene panel for the diagnosis of heterogeneous nonsyndromic sensorineural hearing loss , 2018, Genetics in Medicine.

[19]  L. Vissers,et al.  Detection of clinically relevant copy-number variants by exome sequencing in a large cohort of genetic disorders , 2016, Genetics in Medicine.

[20]  Alicia Oshlack,et al.  Ximmer: a system for improving accuracy and consistency of CNV calling from exome data , 2018, bioRxiv.

[21]  Bradley P. Coe,et al.  Copy number variation detection and genotyping from exome sequence data , 2012, Genome research.

[22]  Mahdi Sarmady,et al.  Characterizing reduced coverage regions through comparison of exome and genome sequencing data across ten centers , 2017, Genetics in Medicine.

[23]  Leslie G Biesecker,et al.  Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies. , 2010, American journal of human genetics.

[24]  Alexis B. Carter,et al.  Standards and Guidelines for Validating Next-Generation Sequencing Bioinformatics Pipelines: A Joint Recommendation of the Association for Molecular Pathology and the College of American Pathologists. , 2018, The Journal of molecular diagnostics : JMD.

[25]  Birgit Funke,et al.  Navigating highly homologous genes in a molecular diagnostic setting: a resource for clinical next-generation sequencing , 2016, Genetics in Medicine.

[26]  Chris Bizon,et al.  Increasing the diagnostic yield of exome sequencing by copy number variant analysis , 2018, PloS one.

[27]  G. Kong,et al.  Gene-based comparative analysis of tools for estimating copy number alterations using whole-exome sequencing data , 2017, Oncotarget.

[28]  Frederick E. Dewey,et al.  CLAMMS: a scalable algorithm for calling common and rare copy number variants from exome sequencing data , 2015, Bioinform..

[29]  David G. Knowles,et al.  Fast Computation and Applications of Genome Mappability , 2012, PloS one.

[30]  Joseph T. Glessner,et al.  PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. , 2007, Genome research.

[31]  Xiaolin Zhu,et al.  An Evaluation of Copy Number Variation Detection Tools from Whole‐Exome Sequencing Data , 2014, Human mutation.