Transcriptional fidelity enhances cancer cell line selection in pediatric cancers

Multi-omic technologies have allowed for comprehensive profiling of patient-derived tumor samples and the cell lines that are intended to model them. Yet, our understanding of how cancer cell lines reflect native pediatric cancers in the age of molecular subclassification remains unclear and represents a clinical unmet need. Here we use Treehouse public data to provide an RNA-seq driven analysis of 799 cancer cell lines, focusing on how well they correlate to 1,655 pediatric tumor samples spanning 12 tumor types. For each tumor type we present a ranked list of the most representative cell lines based on correlation of their transcriptomic profiles to those of the tumor. We found that most (8/12) tumor types best correlated to a cell line of the closest matched disease type. We furthermore showed that inferred molecular subtype differences in medulloblastoma significantly impacted correlation between medulloblastoma tumor samples and cell lines. Our results are available as an interactive web application to help researchers select cancer cell lines that more faithfully recapitulate pediatric cancer.

[1]  Joshua M. Dempster,et al.  A First-Generation Pediatric Cancer Dependency Map , 2021, Nature Genetics.

[2]  A. Jemal,et al.  Cancer Statistics, 2021 , 2021, CA: a cancer journal for clinicians.

[3]  J. Mora,et al.  Comprehensive Biology and Genetics Compendium of Wilms Tumor Cell Lines with Different WT1 Mutations , 2020, Cancers.

[4]  F. Westermann,et al.  Super enhancers define regulatory subtypes and cell identity in neuroblastoma , 2020, Nature Cancer.

[5]  James M. McFarland,et al.  Global computational alignment of tumor and cell line transcriptional profiles , 2020, Nature Communications.

[6]  A. Peraud,et al.  Compare and contrast: pediatric cancer versus adult malignancies , 2019, Cancer and Metastasis Reviews.

[7]  Svenn-Arne Dragly,et al.  Perineuronal nets stabilize the grid cell network , 2019, bioRxiv.

[8]  A. Butte,et al.  Comprehensive transcriptomic analysis of cell lines as models of primary tumors across 22 tumor types , 2019, Nature Communications.

[9]  M. Salvatore,et al.  Cancer Cell Lines Are Useful Model Systems for Medical Research , 2019, Cancers.

[10]  D. Nam,et al.  Relevance of a TCGA-derived Glioblastoma Subtype Gene-Classifier among Patient Populations , 2019, Scientific Reports.

[11]  Joshua M. Korn,et al.  Next-generation characterization of the Cancer Cell Line Encyclopedia , 2019, Nature.

[12]  G. G. Galli,et al.  The landscape of cancer cell line metabolism , 2019, Nature Medicine.

[13]  J. Biegel,et al.  The genomic landscape of pediatric cancers: Implications for diagnosis and treatment , 2019, Science.

[14]  Axel Meyer,et al.  Asymmetric paralog evolution between the “cryptic” gene Bmp16 and its well-studied sister genes Bmp2 and Bmp4 , 2019, Scientific Reports.

[15]  M. Resnick Re: The Cumulative Burden of Surviving Childhood Cancer: An Initial Report from the St Jude Lifetime Cohort Study (SJLIFE). , 2018, The Journal of urology.

[16]  Michael C. Heinold,et al.  The landscape of genomic alterations across childhood cancers , 2018, Nature.

[17]  Michael W. Bishop,et al.  The Cumulative Burden of Surviving Childhood Cancer: An Initial Report from the St. Jude Lifetime Cohort Study , 2017, The Lancet.

[18]  Phillip G. Montgomery,et al.  Defining a Cancer Dependency Map , 2017, Cell.

[19]  Yiling Lu,et al.  Characterization of Human Cancer Cell Lines by Reverse-phase Protein Arrays. , 2017, Cancer cell.

[20]  B. Coyle,et al.  In vitro models of medulloblastoma: Choosing the right tool for the job. , 2016, Journal of biotechnology.

[21]  D.P. Ivanov,et al.  Data on the number and frequency of scientific literature citations for established medulloblastoma cell lines , 2016, Data in brief.

[22]  Thomas D. Wu,et al.  A comprehensive transcriptional portrait of human cancer cell lines , 2014, Nature Biotechnology.

[23]  P. Dallas,et al.  Gene Expression Analyses of the Spatio-Temporal Relationships of Human Medulloblastoma Subgroups during Early Human Neurogenesis , 2014, PloS one.

[24]  Puja Gupta,et al.  Late effects in adult survivors of pediatric cancer: a guide for the primary care physician. , 2012, The American journal of medicine.

[25]  Guangchuang Yu,et al.  clusterProfiler: an R package for comparing biological themes among gene clusters. , 2012, Omics : a journal of integrative biology.

[26]  S. Ramaswamy,et al.  Systematic identification of genomic markers of drug sensitivity in cancer cells , 2012, Nature.

[27]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[28]  Davis J. McCarthy,et al.  Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation , 2012, Nucleic acids research.

[29]  Scott L. Pomeroy,et al.  Molecular subgroups of medulloblastoma: the current consensus , 2011, Acta Neuropathologica.

[30]  S. Gabriel,et al.  Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. , 2010, Cancer cell.

[31]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[32]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[33]  D. Osuna,et al.  Advances in Ewing's sarcoma research: where are we now and what lies ahead? , 2009, Cancer research.

[34]  E. Birney,et al.  Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt , 2009, Nature Protocols.

[35]  M. Robinson,et al.  Small-sample estimation of negative binomial dispersion, with applications to SAGE data. , 2007, Biostatistics.

[36]  Mark D. Robinson,et al.  Moderated statistical tests for assessing differences in tag abundance , 2007, Bioinform..

[37]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[38]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Bart De Moor,et al.  BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis , 2005, Bioinform..

[40]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[41]  John R. W. Masters,et al.  Human cancer cell lines: fact and fantasy , 2000, Nature Reviews Molecular Cell Biology.