Inferring Clonal Composition from Multiple Sections of a Breast Cancer

Cancers arise from successive rounds of mutation and selection, generating clonal populations that vary in size, mutational content and drug responsiveness. Ascertaining the clonal composition of a tumor is therefore important both for prognosis and therapy. Mutation counts and frequencies resulting from next-generation sequencing (NGS) potentially reflect a tumor's clonal composition; however, deconvolving NGS data to infer a tumor's clonal structure presents a major challenge. We propose a generative model for NGS data derived from multiple subsections of a single tumor, and we describe an expectation-maximization procedure for estimating the clonal genotypes and relative frequencies using this model. We demonstrate, via simulation, the validity of the approach, and then use our algorithm to assess the clonal composition of a primary breast cancer and associated metastatic lymph node. After dividing the tumor into subsections, we perform exome sequencing for each subsection to assess mutational content, followed by deep sequencing to precisely count normal and variant alleles within each subsection. By quantifying the frequencies of 17 somatic variants, we demonstrate that our algorithm predicts clonal relationships that are both phylogenetically and spatially plausible. Applying this method to larger numbers of tumors should cast light on the clonal evolution of cancers in space and time.

[1]  M. Stephens,et al.  Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis , 2010, PLoS genetics.

[2]  R. E. Jones,et al.  Nonlinear finite elements , 1978 .

[3]  Ali Bashashati,et al.  Distinct evolutionary trajectories of primary high-grade serous ovarian cancers revealed through spatial mutational profiling , 2013, The Journal of pathology.

[4]  Joshua F. McMichael,et al.  The Origin and Evolution of Mutations in Acute Myeloid Leukemia , 2012, Cell.

[5]  Bo Chen,et al.  Automatic estimation the number of clusters in hierarchical data clustering , 2010, Proceedings of 2010 IEEE/ASME International Conference on Mechatronic and Embedded Systems and Applications.

[6]  Pasi Fränti,et al.  Knee Point Detection in BIC for Detecting the Number of Clusters , 2008, ACIVS.

[7]  A. Bouchard-Côté,et al.  PyClone: statistical inference of clonal population structure in cancer , 2014, Nature Methods.

[8]  A. Børresen-Dale,et al.  Mutational Processes Molding the Genomes of 21 Breast Cancers , 2012, Cell.

[9]  John E. Dennis,et al.  Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[10]  M. Gerstung,et al.  Reliable detection of subclonal single-nucleotide variants in tumour cell populations , 2012, Nature Communications.

[11]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[12]  Jinlong Wu Binomial Matrix Factorization for Discrete Collaborative Filtering , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[13]  Joshua F. McMichael,et al.  Genome Remodeling in a Basal-like Breast Cancer Metastasis and Xenograft , 2010, Nature.

[14]  David E. Irwin,et al.  Finding a "Kneedle" in a Haystack: Detecting Knee Points in System Behavior , 2011, 2011 31st International Conference on Distributed Computing Systems Workshops.

[15]  Lucas P. Watkins,et al.  Detection of intensity change points in time-resolved single-molecule measurements. , 2005, The journal of physical chemistry. B.

[16]  Shankar Vembu,et al.  Inferring clonal evolution of tumors from single nucleotide somatic mutations , 2012, BMC Bioinformatics.

[17]  P. Campbell,et al.  Single-cell mutational profiling and clonal phylogeny in cancer , 2013, Genome research.

[18]  R. Fletcher Practical Methods of Optimization , 1988 .

[19]  Benjamin J. Raphael,et al.  THetA: inferring intra-tumor heterogeneity from high-throughput DNA sequencing data , 2013, Genome Biology.

[20]  H. Swerdlow,et al.  A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers , 2012, BMC Genomics.

[21]  F. Markowetz,et al.  The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups , 2012, Nature.

[22]  Huanming Yang,et al.  Single-Cell Exome Sequencing and Monoclonal Evolution of a JAK2-Negative Myeloproliferative Neoplasm , 2012, Cell.

[23]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[24]  Serafim Batzoglou,et al.  Genome evolution during progression to breast cancer , 2013, Genome research.

[25]  J. Troge,et al.  Tumour evolution inferred by single-cell sequencing , 2011, Nature.

[26]  Jonathan M Irish,et al.  Single Cell Profiling of Potentiated Phospho-Protein Networks in Cancer Cells , 2004, Cell.

[27]  T. M. Williams,et al.  Practical Methods of Optimization. Vol. 1: Unconstrained Optimization , 1980 .

[28]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[29]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[30]  T. M. Williams Practical Methods of Optimization. Vol. 2 — Constrained Optimization , 1982 .

[31]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[32]  Pasi Fränti,et al.  Knee Point Detection on Bayesian Information Criterion , 2008, 2008 20th IEEE International Conference on Tools with Artificial Intelligence.

[33]  Raphael Gottardo,et al.  flowClust: a Bioconductor package for automated gating of flow cytometry data , 2009, BMC Bioinformatics.

[34]  Kevin P. Murphy,et al.  SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors , 2010, Bioinform..

[35]  M. Nowak,et al.  Distant Metastasis Occurs Late during the Genetic Evolution of Pancreatic Cancer , 2010, Nature.

[36]  Michael I. Jordan,et al.  Tree-Structured Stick Breaking for Hierarchical Data , 2010, NIPS.

[37]  R. Shibata Statistical aspects of model selection , 1989 .

[38]  S. Wold Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models , 1978 .

[39]  T. Speed,et al.  Model selection and prediction: Normal regression , 1993 .

[40]  Nathan Srebro,et al.  Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[41]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[42]  Daniel A. Powers,et al.  Statistical Methods for Categorical Data Analysis , 1999 .

[43]  Joshua F. McMichael,et al.  Clonal evolution in relapsed acute myeloid leukemia revealed by whole genome sequencing , 2011, Nature.

[44]  Huanming Yang,et al.  Single-Cell Exome Sequencing Reveals Single-Nucleotide Mutation Characteristics of a Kidney Tumor , 2012, Cell.

[45]  Amit Dhingra,et al.  Rapid and accurate pyrosequencing of angiosperm plastid genomes , 2006, BMC Plant Biology.

[46]  P. Wriggers Nonlinear Finite Element Methods , 2008 .

[47]  Marco A. Marra,et al.  Cancer genome-sequencing study design , 2013, Nature Reviews Genetics.

[48]  Ken Chen,et al.  Clonal architecture of secondary acute myeloid leukemia. , 2012, The New England journal of medicine.

[49]  Ryan D. Morin,et al.  Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution , 2009, Nature.

[50]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[51]  P. A. Futreal,et al.  Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. , 2012, The New England journal of medicine.

[52]  Hongyu Zhao,et al.  SomatiCA: Identifying, Characterizing and Quantifying Somatic Copy Number Aberrations from Cancer Genome Sequencing Data , 2013, PloS one.

[53]  Lani F. Wu,et al.  Molecular Systems Biology 6; Article number 369; doi:10.1038/msb.2010.22 Citation: Molecular Systems Biology 6:369 , 2022 .

[54]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration , 2012, Briefings Bioinform..

[55]  Irmtraud M. Meyer,et al.  The clonal and mutational evolution spectrum of primary triple-negative breast cancers , 2012, Nature.

[56]  Ya-Xiang Yuan,et al.  Optimization Theory and Methods: Nonlinear Programming , 2010 .

[57]  A. McKenna,et al.  Absolute quantification of somatic DNA alterations in human cancer , 2012, Nature Biotechnology.

[58]  Patrick O. Perry,et al.  Bi-cross-validation of the SVD and the nonnegative matrix factorization , 2009, 0908.2062.

[59]  Ken Chen,et al.  VarScan: variant detection in massively parallel sequencing of individual and pooled samples , 2009, Bioinform..

[60]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[61]  Y. Kluger,et al.  TrAp: a tree approach for fingerprinting subclonal tumor composition , 2013, Nucleic acids research.

[62]  J. Troge,et al.  Inferring tumor progression from genomic heterogeneity. , 2010, Genome research.

[63]  Jacob A. Tennessen,et al.  Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes , 2012, Science.

[64]  Jenny Taylor,et al.  Monitoring chronic lymphocytic leukemia progression by whole genome sequencing reveals heterogeneous clonal evolution patterns. , 2012, Blood.