Linking transcriptional and genetic tumor heterogeneity through allele analysis of single-cell RNA-seq data

Characterization of intratumoral heterogeneity is critical to cancer therapy, as the presence of phenotypically diverse cell populations commonly fuels relapse and resistance to treatment. Although genetic variation is a well-studied source of intratumoral heterogeneity, the functional impact of most genetic alterations remains unclear. Even less understood is the relative importance of other factors influencing heterogeneity, such as epigenetic state or tumor microenvironment. To investigate the relationship between genetic and transcriptional heterogeneity in a context of cancer progression, we devised a computational approach called HoneyBADGER to identify copy number variation and loss of heterozygosity in individual cells from single-cell RNA-sequencing data. By integrating allele and normalized expression information, HoneyBADGER is able to identify and infer the presence of subclone-specific alterations in individual cells and reconstruct the underlying subclonal architecture. By examining several tumor types, we show that HoneyBADGER is effective at identifying deletions, amplifications, and copy-neutral loss-of-heterozygosity events and is capable of robustly identifying subclonal focal alterations as small as 10 megabases. We further apply HoneyBADGER to analyze single cells from a progressive multiple myeloma patient to identify major genetic subclones that exhibit distinct transcriptional signatures relevant to cancer progression. Other prominent transcriptional subpopulations within these tumors did not line up with the genetic subclonal structure and were likely driven by alternative, nonclonal mechanisms. These results highlight the need for integrative analysis to understand the molecular and phenotypic heterogeneity in cancer.

[1]  G. Roodman,et al.  Role of the Bone Marrow Microenvironment in Multiple Myeloma , 2002, Journal of bone and mineral research : the official journal of the American Society for Bone and Mineral Research.

[2]  N. Potter,et al.  Single-cell genetic analysis reveals the composition of initiating clones and phylogenetic patterns of branching and parallel evolution in myeloma , 2014, Leukemia.

[3]  L. Stein,et al.  Annotating Cancer Variants and Anti-Cancer Therapeutics in Reactome , 2012, Cancers.

[4]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[5]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[6]  C. Ponting,et al.  G&T-seq: parallel sequencing of single-cell genomes and transcriptomes , 2015, Nature Methods.

[7]  Joshua F. McMichael,et al.  Clonal evolution in relapsed acute myeloid leukemia revealed by whole genome sequencing , 2011, Nature.

[8]  Wei Shi,et al.  featureCounts: an efficient general purpose program for assigning sequence reads to genomic features , 2013, Bioinform..

[9]  Emmanuel Barillot,et al.  Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization , 2010, Bioinform..

[10]  D. Rossi,et al.  Beta‐2‐microglobulin is an independent predictor of progression in asymptomatic multiple myeloma , 2010, Cancer.

[11]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[12]  Joseph T. Glessner,et al.  PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. , 2007, Genome research.

[13]  Joseph L. Herman,et al.  Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis , 2015, Nature Methods.

[14]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[15]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[16]  P. A. Futreal,et al.  Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. , 2012, The New England journal of medicine.

[17]  T. Volkert,et al.  E2F integrates cell cycle progression with DNA repair, replication, and G(2)/M checkpoints. , 2002, Genes & development.

[18]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Jeong Eon Lee,et al.  Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer , 2017, Nature Communications.

[20]  Henning Hermjakob,et al.  The Reactome pathway Knowledgebase , 2015, Nucleic acids research.

[21]  Yoav Mayshar,et al.  Identification and classification of chromosomal aberrations in human induced pluripotent stem cells. , 2010, Cell stem cell.

[22]  J. Mesirov,et al.  The Molecular Signatures Database Hallmark Gene Set Collection , 2015 .

[23]  Hongyu Zhao,et al.  SomatiCA: Identifying, Characterizing and Quantifying Somatic Copy Number Aberrations from Cancer Genome Sequencing Data , 2013, PloS one.

[24]  Åsa K. Björklund,et al.  Full-length RNA-seq from single cells using Smart-seq2 , 2014, Nature Protocols.

[25]  R. Orlowski,et al.  Anti-β2-microglobulin monoclonal antibodies overcome bortezomib resistance in multiple myeloma by inhibiting autophagy , 2015, Oncotarget.

[26]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[27]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[28]  K. Vanderkerken,et al.  The role of the bone marrow microenvironment in multiple myeloma. , 2005, Histology and histopathology.

[29]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[30]  J. Mesirov,et al.  The Molecular Signatures Database (MSigDB) hallmark gene set collection. , 2015, Cell systems.

[31]  J. Thiery Epithelial–mesenchymal transitions in tumour progression , 2002, Nature Reviews Cancer.

[32]  John Crowley,et al.  The molecular classification of multiple myeloma. , 2006, Blood.

[33]  Donna Neuberg,et al.  Integrated single-cell genetic and transcriptional analysis suggests novel drivers of chronic lymphocytic leukemia , 2017, Genome research.

[34]  Doree Sitkoff,et al.  models homology modeling : From sequence alignments to structural A comparative study of available software for high-accuracy , 2005 .

[35]  Chun Jimmie Ye,et al.  Multiplexed droplet single-cell RNA-sequencing using natural genetic variation , 2017, Nature Biotechnology.

[36]  Piero Carninci,et al.  Biased allelic expression in human primary fibroblast single cells. , 2015, American journal of human genetics.

[37]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[38]  D. Scadden,et al.  A Novel Role for CCL3 (MIP-1α) in Myeloma-induced Bone Disease via Osteocalcin Downregulation and Inhibition of Osteoblast Function , 2011, Leukemia.

[39]  A. McKenna,et al.  Widespread genetic heterogeneity in multiple myeloma: implications for targeted therapy. , 2014, Cancer cell.

[40]  Lisa J. Murray,et al.  Intraclonal heterogeneity is a critical early event in the development of myeloma and precedes the development of clinical symptoms , 2013, Leukemia.

[41]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[42]  R. Sandberg,et al.  Single-Cell RNA-Seq Reveals Dynamic, Random Monoallelic Gene Expression in Mammalian Cells , 2014, Science.

[43]  Siddharth S. Dey,et al.  Integrated genome and transcriptome sequencing from the same cell , 2014, Nature Biotechnology.

[44]  Tatiana Popova,et al.  Supplementary Methods , 2012, Acta Neuropsychiatrica.

[45]  R. Orlowski,et al.  Chromosome 8q24.1/c-MYC abnormality: a marker for high-risk myeloma , 2015, Leukemia & lymphoma.

[46]  Irmtraud M. Meyer,et al.  The clonal and mutational evolution spectrum of primary triple-negative breast cancers , 2012, Nature.

[47]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[48]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[49]  Charles H. Yoon,et al.  Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq , 2016, Science.

[50]  Shawn M. Gillespie,et al.  Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma , 2014, Science.

[51]  K. Kinzler,et al.  Cancer Genome Landscapes , 2013, Science.

[52]  Joseph B Hiatt,et al.  Evidence for compensatory upregulation of expressed X-linked genes in mammals, Caenorhabditis elegans and Drosophila melanogaster , 2011, Nature Genetics.

[53]  E. Mroz,et al.  Intra-tumor Genetic Heterogeneity and Mortality in Head and Neck Cancer: Analysis of Data from The Cancer Genome Atlas , 2015, PLoS medicine.

[54]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[55]  N. Sekiguchi,et al.  The impact of C-Myc gene-related aberrations in newly diagnosed myeloma with bortezomib/dexamethasone therapy , 2014, International Journal of Hematology.

[56]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[57]  Catherine J. Wu CLL clonal heterogeneity: an ecology of competing subpopulations. , 2012, Blood.