Mutation Discovery in Regions of Segmental Cancer Genome Amplifications with CoNAn-SNV: A Mixture Model for Next Generation Sequencing of Tumors

Next generation sequencing has now enabled a cost-effective enumeration of the full mutational complement of a tumor genome—in particular single nucleotide variants (SNVs). Most current computational and statistical models for analyzing next generation sequencing data, however, do not account for cancer-specific biological properties, including somatic segmental copy number alterations (CNAs)—which require special treatment of the data. Here we present CoNAn-SNV (Copy Number Annotated SNV): a novel algorithm for the inference of single nucleotide variants (SNVs) that overlap copy number alterations. The method is based on modelling the notion that genomic regions of segmental duplication and amplification induce an extended genotype space where a subset of genotypes will exhibit heavily skewed allelic distributions in SNVs (and therefore render them undetectable by methods that assume diploidy). We introduce the concept of modelling allelic counts from sequencing data using a panel of Binomial mixture models where the number of mixtures for a given locus in the genome is informed by a discrete copy number state given as input. We applied CoNAn-SNV to a previously published whole genome shotgun data set obtained from a lobular breast cancer and show that it is able to discover 21 experimentally revalidated somatic non-synonymous mutations in a lobular breast cancer genome that were not detected using copy number insensitive SNV detection algorithms. Importantly, ROC analysis shows that the increased sensitivity of CoNAn-SNV does not result in disproportionate loss of specificity. This was also supported by analysis of a recently published lymphoma genome with a relatively quiescent karyotype, where CoNAn-SNV showed similar results to other callers except in regions of copy number gain where increased sensitivity was conferred. Our results indicate that in genomically unstable tumors, copy number annotation for SNV detection will be critical to fully characterize the mutational landscape of cancer genomes.

[1]  Mitsutaka Kadota,et al.  Identification of novel gene amplifications in breast cancer and coexistence of gene amplification with an activating mutation of PIK3CA. , 2009, Cancer research.

[2]  Steven J. M. Jones,et al.  Frequent mutation of histone modifying genes in non-Hodgkin lymphoma , 2011, Nature.

[3]  Jerry Nedelman,et al.  Book review: “Bayesian Data Analysis,” Second Edition by A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin Chapman & Hall/CRC, 2004 , 2005, Comput. Stat..

[4]  John Herrick,et al.  Genomic organization of amplified MYC genes suggests distinct mechanisms of amplification in tumorigenesis. , 2005, Cancer research.

[5]  Derek Y. Chiang,et al.  High-resolution mapping of copy-number alterations with massively parallel sequencing , 2009, Nature Methods.

[6]  C. Yau,et al.  QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data , 2007, Nucleic acids research.

[7]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[8]  W. Marston Linehan,et al.  Trisomy 7-harbouring non-random duplication of the mutant MET allele in hereditary papillary renal carcinomas , 1998, Nature Genetics.

[9]  Ryan D. Morin,et al.  Mutation of FOXL2 in granulosa-cell tumors of the ovary. , 2009, The New England journal of medicine.

[10]  Ken Chen,et al.  Recurring mutations found by sequencing an acute myeloid leukemia genome. , 2009, The New England journal of medicine.

[11]  E. Birney,et al.  A small cell lung cancer genome reports complex tobacco exposure signatures , 2009, Nature.

[12]  Ryan D. Morin,et al.  Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution , 2009, Nature.

[13]  Joseph T. Glessner,et al.  PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. , 2007, Genome research.

[14]  C. Conti,et al.  Nonrandom duplication of the chromosome bearing a mutated Ha-ras-1 allele in mouse skin tumors. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Joshua F. McMichael,et al.  Genome Remodeling in a Basal-like Breast Cancer Metastasis and Xenograft , 2010, Nature.

[16]  Joshua M. Korn,et al.  Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs , 2008, Nature Genetics.

[17]  Cheng Li,et al.  Allele-Specific Amplification in Cancer Revealed by SNP Array Analysis , 2005, PLoS Comput. Biol..

[18]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[19]  Ingo Ruczinski,et al.  Hidden Markov models for the assessment of chromosomal alterations using high-throughput SNP arrays. , 2008, The annals of applied statistics.

[20]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[21]  Kevin P. Murphy,et al.  SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors , 2010, Bioinform..

[22]  Tom Royce,et al.  A comprehensive catalogue of somatic mutations from a human cancer genome , 2010, Nature.

[23]  Amy E. Hawkins,et al.  DNA sequencing of a cytogenetically normal acute myeloid leukemia genome , 2008, Nature.

[24]  Gholamreza Haffari,et al.  Feature-based classifiers for somatic mutation detection in tumour–normal paired sequencing data , 2011, Bioinform..

[25]  S. Swamy,et al.  PICNIC: an algorithm to predict absolute allelic copy number variation with microarray cancer data , 2009, Biostatistics.

[26]  Wenyi Wang,et al.  Estimating Genome-Wide Copy Number Using Allele-Specific Mixture Models , 2008, J. Comput. Biol..