Pre-capture multiplexing provides additional power to detect copy number variation in exome sequencing

Background As exome sequencing (ES) integrates into clinical practice, we should make every effort to utilize all information generated. Copy-number variation can lead to Mendelian disorders, but small copy-number variants (CNVs) often get overlooked or obscured by under-powered data collection. Many groups have developed methodology for detecting CNVs from ES, but existing methods often perform poorly for small CNVs and rely on large numbers of samples not always available to clinical laboratories. Furthermore, methods often rely on Bayesian approaches requiring user-defined priors in the setting of insufficient prior knowledge. This report first demonstrates the benefit of multiplexed exome capture (pooling samples prior to capture), then presents a novel detection algorithm, mcCNV (“multiplexed capture CNV”), built around multiplexed capture. Results We demonstrate: (1) multiplexed capture reduces inter-sample variance; (2) our mcCNV method, a novel depth-based algorithm for detecting CNVs from multiplexed capture ES data, improves the detection of small CNVs. We contrast our novel approach, agnostic to prior information, with the the commonly-used ExomeDepth. In a simulation study mcCNV demonstrated a favorable false discovery rate (FDR). When compared to calls made from matched genome sequencing, we find the mcCNV algorithm performs comparably to ExomeDepth. Conclusion Implementing multiplexed capture increases power to detect single-exon CNVs. The novel mcCNV algorithm may provide a more favorable FDR than ExomeDepth. The greatest benefits of our approach derive from (1) not requiring a database of reference samples and (2) not requiring prior information about the prevalance or size of variants.

[1]  Sven Rahmann,et al.  Snakemake--a scalable bioinformatics workflow engine. , 2012, Bioinformatics.

[2]  Kristy Lee,et al.  The NCGENES project: exploring the new world of genome sequencing. , 2013, North Carolina medical journal.

[3]  Scott Happe,et al.  Pre-capture multiplexing improves efficiency and cost-effectiveness of targeted genomic enrichment , 2012, BMC Genomics.

[4]  D. Reich,et al.  Cost-effective, high-throughput DNA sequencing libraries for multiplexed target capture , 2012, Genome research.

[5]  T. Minka Estimating a Dirichlet distribution , 2012 .

[6]  Namshin Kim,et al.  Effect of Next-Generation Exome Sequencing Depth for Discovery of Diagnostic Variants , 2015, Genomics & informatics.

[7]  Liying Yang,et al.  CNV_IFTV: An Isolation Forest and Total Variation-Based Detection of CNVs from Short-Read Sequencing Data , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[8]  L. Gautier,et al.  Cost-effective multiplexing before capture allows screening of 25 000 clinically relevant SNPs in childhood acute lymphoblastic leukemia , 2011, Leukemia.

[9]  Bradley P. Coe,et al.  Copy number variation detection and genotyping from exome sequence data , 2012, Genome research.

[10]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[11]  Wolfgang Huber,et al.  Shrinkage estimation of dispersion in Negative Binomial models for RNA-seq experiments with small sample size , 2013, Bioinform..

[12]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[13]  Liying Yang,et al.  CONDEL: Detecting Copy Number Variation and Genotyping Deletion Zygosity from Single Tumor Samples Using Sequence Data , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[14]  Nicholas W. Wood,et al.  A robust model for read count data in exome sequencing experiments and implications for copy number variant calling , 2012, Bioinform..

[15]  K. Czene,et al.  Library Preparation and Multiplex Capture for Massive Parallel Sequencing Applications Made Efficient and Easy , 2012, PloS one.

[16]  Tatiana Popova,et al.  Supplementary Methods , 2012, Acta Neuropsychiatrica.

[17]  Joshua S. Paul,et al.  Prevalence and properties of intragenic copy-number variation in Mendelian disease genes , 2018, Genetics in Medicine.

[18]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[19]  S. Bale,et al.  Assessing copy number from exome sequencing and exome array CGH based on CNV spectrum in a large clinical cohort , 2014, Genetics in Medicine.

[20]  M. Gerstein,et al.  CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. , 2011, Genome research.

[21]  Y. Benjamini,et al.  Summarizing and correcting the GC content bias in high-throughput sequencing , 2012, Nucleic acids research.

[22]  Sven Rahmann,et al.  Genome analysis , 2022 .

[23]  J. R. MacDonald,et al.  A Comprehensive Workflow for Read Depth-Based Identification of Copy-Number Variation from Whole-Genome Sequence Data. , 2018, American journal of human genetics.

[24]  Andrew Collins,et al.  Exome sequence read depth methods for identifying copy number changes , 2015, Briefings Bioinform..

[25]  Nancy R. Zhang,et al.  CODEX: a normalization and copy number variation detection method for whole exome sequencing , 2015, Nucleic acids research.

[26]  Yiping Shen,et al.  Evaluation of three read-depth based CNV detection tools using whole-exome sequencing data , 2017, Molecular Cytogenetics.

[27]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[28]  E. Banks,et al.  Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. , 2012, American journal of human genetics.

[29]  K. Shianna,et al.  Using ERDS to infer copy-number variants in high-coverage genomes. , 2012, American journal of human genetics.

[30]  Derek Y. Chiang,et al.  High-resolution mapping of copy-number alterations with massively parallel sequencing , 2009, Nature Methods.

[31]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[32]  Francesco Vallania,et al.  Population-based rare variant detection via pooled exome or custom hybridization capture with or without individual indexing , 2012, BMC Genomics.

[33]  Nikhil Wagle,et al.  Clinical Sequencing Exploratory Research Consortium: Accelerating Evidence-Based Practice of Genomic Medicine. , 2016, American journal of human genetics.

[34]  Chris Bizon,et al.  Increasing the diagnostic yield of exome sequencing by copy number variant analysis , 2018, PloS one.