Hydra: A mixture modeling framework for subtyping pediatric cancer cohorts using multimodal gene expression signatures

Precision oncology has primarily relied on coding mutations as biomarkers of response to therapies. While transcriptome analysis can provide valuable information, incorporation into workflows has been difficult. For example, the relative rather than absolute gene expression level needs to be considered, requiring differential expression analysis across samples. However, expression programs related to the cell-of-origin and tumor microenvironment effects confound the search for cancer-specific expression changes. To address these challenges, we developed an unsupervised clustering approach for discovering differential pathway expression within cancer cohorts using gene expression measurements. The hydra approach uses a Dirichlet process mixture model to automatically detect multimodally distributed genes and expression signatures without the need for matched normal tissue. We demonstrate that the hydra approach is more sensitive than widely-used gene set enrichment approaches for detecting multimodal expression signatures. Application of the hydra analysis framework to small blue round cell tumors (including rhabdomyosarcoma, synovial sarcoma, neuroblastoma, Ewing sarcoma, and osteosarcoma) identified expression signatures associated with changes in the tumor microenvironment. The hydra approach also identified an association between ATRX deletions and elevated immune marker expression in high-risk neuroblastoma. Notably, hydra analysis of all small blue round cell tumors revealed similar subtypes, characterized by changes to infiltrating immune and stromal expression signatures.

[1]  Mathieu Bastian,et al.  Gephi: An Open Source Software for Exploring and Manipulating Networks , 2009, ICWSM.

[2]  David B. Dunson,et al.  Bayesian data analysis, third edition , 2013 .

[3]  A. Butte,et al.  xCell: digitally portraying the tissue cellular heterogeneity landscape , 2017, Genome Biology.

[4]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[5]  Deshka S. Foster,et al.  The evolving relationship of wound healing and tumor stroma. , 2018, JCI insight.

[6]  Ash A. Alizadeh,et al.  Robust enumeration of cell subsets from tissue expression profiles , 2015, Nature Methods.

[7]  Erik B. Sudderth,et al.  Memoized Online Variational Inference for Dirichlet Process Mixture Models , 2013, NIPS.

[8]  Peter W. Laird,et al.  Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer , 2018, Cell.

[9]  Paul G. Thomas,et al.  Pediatric patients with acute lymphoblastic leukemia generate abundant and functional neoantigen-specific CD8+ T cell responses , 2019, Science Translational Medicine.

[10]  Debashis Ghosh,et al.  Mixture models for assessing differential expression in complex tissues using microarray data , 2004, Bioinform..

[11]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[12]  Peter F Thall,et al.  Bayesian nonparametric statistics: A new toolkit for discovery in cancer research , 2017, Pharmaceutical statistics.

[13]  Peter J Houghton,et al.  Initial testing of the aurora kinase a inhibitor MLN8237 by the Pediatric Preclinical Testing Program (PPTP) , 2010, Pediatric blood & cancer.

[14]  Maksim Terpilowski,et al.  scikit-posthocs: Pairwise multiple comparison tests in Python , 2019, J. Open Source Softw..

[15]  Steven J. M. Jones,et al.  Integrated Genomic Characterization of Pancreatic Ductal Adenocarcinoma. , 2017, Cancer cell.

[16]  Gary D Bader,et al.  Enrichment Map: A Network-Based Method for Gene-Set Enrichment Visualization and Interpretation , 2010, PloS one.

[17]  Ezekiel Adebiyi,et al.  Clustering Algorithms: Their Application to Gene Expression Data , 2016, Bioinformatics and biology insights.

[18]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[19]  D. Fearon,et al.  T cell exclusion, immune privilege, and the tumor microenvironment , 2015, Science.

[20]  Roberto Romero,et al.  A Comparison of Gene Set Analysis Methods in Terms of Sensitivity, Prioritization and Specificity , 2013, PloS one.

[21]  David Tritchler,et al.  Filtering Genes for Cluster and Network Analysis , 2009, BMC Bioinformatics.

[22]  Mithat Gönen,et al.  An efficient basket trial design , 2017, Statistics in medicine.

[23]  Davis J. McCarthy,et al.  Count-based differential expression analysis of RNA sequencing data using R and Bioconductor , 2013, Nature Protocols.

[24]  Li Wang,et al.  Corrigendum: The Serum Profile of Hypercytokinemia Factors Identified in H7N9-Infected Patients can Predict Fatal Outcomes , 2016, Scientific reports.

[25]  Fernando A. Quintana,et al.  Bayesian Nonparametric Data Analysis , 2015 .

[26]  Mary Goldman,et al.  Toil enables reproducible, open source, big biomedical data analyses , 2017, Nature Biotechnology.

[27]  Matthew D. Wilkerson,et al.  ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking , 2010, Bioinform..

[28]  Guangchuang Yu,et al.  clusterProfiler: an R package for comparing biological themes among gene clusters. , 2012, Omics : a journal of integrative biology.

[29]  James C. Hu,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2019 .

[30]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[31]  Steven J. M. Jones,et al.  The genetic landscape of high-risk neuroblastoma , 2013, Nature Genetics.

[32]  D. B. Dahl Bayesian Inference for Gene Expression and Proteomics: Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model , 2006 .

[33]  R. Gentleman,et al.  Independent filtering increases detection power for high-throughput experiments , 2010, Proceedings of the National Academy of Sciences.

[34]  Marina Vannucci,et al.  Variable selection in clustering via Dirichlet process mixture models , 2006 .

[35]  T. Cai,et al.  A Constrained ℓ1 Minimization Approach to Sparse Precision Matrix Estimation , 2011, 1102.2233.

[36]  Mary Goldman,et al.  The UCSC Xena Platform for cancer genomics data visualization and interpretation , 2018 .

[37]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[38]  Fernando A. Quintana,et al.  Nonparametric Bayesian data analysis , 2004 .

[39]  Eswar G. Phadia Prior Processes and Their Applications , 2013 .

[40]  Justin Guinney,et al.  GSVA: gene set variation analysis for microarray and RNA-Seq data , 2013, BMC Bioinformatics.

[41]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[42]  Oliver Gautschi,et al.  Aurora Kinases as Anticancer Drug Targets , 2008, Clinical Cancer Research.

[43]  George Coukos,et al.  Cancer immunotherapy comes of age , 2011, Nature.

[44]  Gaurav Pandey,et al.  Analysis of Transcriptional Variability in a Large Human iPSC Library Reveals Genetic and Non-genetic Determinants of Heterogeneity. , 2017, Cell stem cell.

[45]  Gudrun Schleiermacher,et al.  The challenge of defining “ultra‐high‐risk” neuroblastoma , 2018, Pediatric blood & cancer.

[46]  Alex H. Wagner,et al.  DGIdb 3.0: a redesign and expansion of the drug–gene interaction database , 2017, bioRxiv.

[47]  Je-Keun Rhee,et al.  Impact of Tumor Purity on Immune Gene Expression and Clustering Analyses across Multiple Cancer Types , 2017, Cancer Immunology Research.

[48]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[49]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[50]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[51]  Michael C. Hughes,et al.  bnpy : Reliable and scalable variational inference for Bayesian nonparametric models , 2014 .

[52]  Yee Whye Teh,et al.  Dirichlet Process , 2017, Encyclopedia of Machine Learning and Data Mining.

[53]  The Gene Ontology Consortium,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2018, Nucleic Acids Res..

[54]  P. Müller,et al.  Bayesian inference for gene expression and proteomics , 2006 .

[55]  I. Mellman,et al.  Elements of cancer immunity and the cancer–immune set point , 2017, Nature.

[56]  David Haussler,et al.  Comparative Tumor RNA Sequencing Analysis for Difficult-to-Treat Pediatric and Young Adult Patients With Cancer , 2019, JAMA network open.

[57]  C. Mackall,et al.  Harnessing the Immunotherapy Revolution for the Treatment of Childhood Cancers. , 2017, Cancer cell.

[58]  Olivier Delattre,et al.  Chromosome instability accounts for reverse metastatic outcomes of pediatric and adult synovial sarcomas. , 2013, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[59]  Carl E. Rasmussen,et al.  Dirichlet Process Gaussian Mixture Models: Choice of the Base Distribution , 2010, Journal of Computer Science and Technology.

[60]  Ben S. Wittner,et al.  Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1 , 2009, Nature.

[61]  G. Getz,et al.  Inferring tumour purity and stromal and immune cell admixture from expression data , 2013, Nature Communications.

[62]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[63]  David Watson,et al.  M3C: A Monte Carlo reference-based consensus clustering algorithm , 2018, bioRxiv.

[64]  L. Ries,et al.  Cancer incidence and survival among children and adolescents: United States SEER Program 1975-1995. , 1999 .

[65]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[66]  Harald Binder,et al.  Transforming RNA-Seq Data to Improve the Performance of Prognostic Gene Signatures , 2014, PloS one.

[67]  J. Mesirov,et al.  The limitations of simple gene set enrichment analysis assuming gene independence , 2011, J. Biomed. Informatics.

[68]  J. Wolchok,et al.  Immune modulation in cancer with antibodies. , 2014, Annual review of medicine.

[69]  Gennady Korotkevich,et al.  Fast gene set enrichment analysis , 2021 .

[70]  David Haussler,et al.  TumorMap: Exploring the Molecular Similarities of Cancer Samples in an Interactive Portal. , 2017, Cancer research.

[71]  Charlotte Soneson,et al.  A comparison of methods for differential expression analysis of RNA-seq data , 2013, BMC Bioinformatics.

[72]  J. Markert,et al.  Checkpoint Proteins in Pediatric Brain and Extracranial Solid Tumors: Opportunities for Immunotherapy , 2016, Clinical Cancer Research.

[73]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[74]  W. Dik,et al.  The JAK1/JAK2‐ inhibitor ruxolitinib inhibits mast cell degranulation and cytokine release , 2018, Clinical and experimental allergy : journal of the British Society for Allergy and Clinical Immunology.

[75]  Andreas Schuppert,et al.  Principal components analysis and the reported low intrinsic dimensionality of gene expression microarray data , 2016, Scientific Reports.

[76]  Helga Thorvaldsdóttir,et al.  Molecular signatures database (MSigDB) 3.0 , 2011, Bioinform..

[77]  Edward James,et al.  Antigen processing and immune regulation in the response to tumours , 2017, Immunology.