Inferring tumour purity and stromal and immune cell admixture from expression data

Infiltrating stromal and immune cells form the major fraction of normal cells in tumour tissue and not only perturb the tumour signal in molecular studies but also have an important role in cancer biology. Here we describe ‘Estimation of STromal and Immune cells in MAlignant Tumours using Expression data’ (ESTIMATE)—a method that uses gene expression signatures to infer the fraction of stromal and immune cells in tumour samples. ESTIMATE scores correlate with DNA copy number-based tumour purity across samples from 11 different tumour types, profiled on Agilent, Affymetrix platforms or based on RNA sequencing and available through The Cancer Genome Atlas. The prediction accuracy is further corroborated using 3,809 transcriptional profiles available elsewhere in the public domain. The ESTIMATE method allows consideration of tumour-associated normal cells in genomic and transcriptomic studies. An R-library is available on https://sourceforge.net/projects/estimateproject/.

[1]  B. Rosner Percentage Points for a Generalized ESD Many-Outlier Procedure , 1983 .

[2]  P. Prescott,et al.  Sequential Application of Wilks's Multivariate Outlier Test , 1992 .

[3]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[4]  B. Spiegelman,et al.  Terminal differentiation of human breast cancer through PPAR gamma. , 1998, Molecular cell.

[5]  R. Brakenhoff,et al.  Molecular cloning and immunogenicity of renal cell carcinoma‐associated antigen G250 , 2000, International journal of cancer.

[6]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Hugues Bersini,et al.  Separation of samples into their constituents using gene expression data , 2001, ISMB.

[8]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[9]  Yudong D. He,et al.  A Gene-Expression Signature as a Predictor of Survival in Breast Cancer , 2002 .

[10]  Van,et al.  A gene-expression signature as a predictor of survival in breast cancer. , 2002, The New England journal of medicine.

[11]  George Coukos,et al.  Intratumoral T cells, recurrence, and survival in epithelial ovarian cancer. , 2003, The New England journal of medicine.

[12]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[13]  J. Tchinda,et al.  Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. , 2006, Science.

[14]  Z. Trajanoski,et al.  Effector memory T cells, early metastasis, and survival in colorectal cancer. , 2005, The New England journal of medicine.

[15]  M J T Reinders,et al.  Purity for clarity: the need for purification of tumor cells in DNA microarray studies , 2005, Leukemia.

[16]  H. Tabuchi,et al.  Gene expression analysis of renal carcinoma: adipose differentiation‐related protein as a potential diagnostic and prognostic biomarker for clear‐cell renal carcinoma , 2005, The Journal of pathology.

[17]  Gerd Ritter,et al.  Intraepithelial CD8+ tumor-infiltrating lymphocytes and a high CD8+/regulatory T cell ratio are associated with favorable prognosis in ovarian cancer. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Raghu Kalluri,et al.  Fibroblasts in cancer , 2006, Nature Reviews Cancer.

[19]  Gang Qu,et al.  AffyProbeMiner: a web resource for computing or retrieving accurately redefined Affymetrix probe sets , 2007, Bioinform..

[20]  R. Tothill,et al.  Novel Molecular Subtypes of Serous and Endometrioid Ovarian Cancer Linked to Clinical Outcome , 2008, Clinical Cancer Research.

[21]  Igor Jurisica,et al.  Gene expression–based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study , 2008, Nature Medicine.

[22]  Joshua M. Korn,et al.  Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2008, Nature.

[23]  Xiao-Jun Ma,et al.  Gene expression profiling of the tumor microenvironment during breast cancer progression , 2009, Breast Cancer Research.

[24]  Brian H. Dunford-Shore,et al.  Somatic mutations affect key pathways in lung adenocarcinoma , 2008, Nature.

[25]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[26]  Markus Munz,et al.  The emerging role of EpCAM in cancer and stem cell signaling. , 2009, Cancer research.

[27]  Ben S. Wittner,et al.  Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1 , 2009, Nature.

[28]  Hod Lipson,et al.  Distilling Free-Form Natural Laws from Experimental Data , 2009, Science.

[29]  Raghu Kalluri,et al.  The basics of epithelial-mesenchymal transition. , 2009, The Journal of clinical investigation.

[30]  Bas J. Wouters,et al.  Prediction of molecular subtypes in acute myeloid leukemia based on gene expression profiling , 2009, Haematologica.

[31]  J. Uhm Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2009 .

[32]  S. Gabriel,et al.  Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. , 2010, Cancer cell.

[33]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[34]  Mark M. Davis,et al.  Cell type–specific gene expression differences in complex tissues , 2010, Nature Methods.

[35]  K. Yoshihara,et al.  Meta-analysis of genome-wide association scans for genetic susceptibility to endometriosis in Japanese population , 2010, Journal of Human Genetics.

[36]  Pekka Ruusuvuori,et al.  Probabilistic analysis of gene expression measurements from heterogeneous tissues , 2010, Bioinform..

[37]  M. Karin,et al.  Immunity, Inflammation, and Cancer , 2010, Cell.

[38]  C. Perou,et al.  Allele-specific copy number analysis of tumors , 2010, Proceedings of the National Academy of Sciences.

[39]  Steven H. Kleinstein,et al.  Cell subset prediction for blood genomic studies , 2011, BMC Bioinformatics.

[40]  Benjamin J. Raphael,et al.  Integrated Genomic Analyses of Ovarian Carcinoma , 2011, Nature.

[41]  Zlatko Trajanoski,et al.  Histopathologic-based prognostic factors of colorectal cancers are associated with the state of the local immune reaction. , 2011, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[42]  D. Hanahan,et al.  Hallmarks of Cancer: The Next Generation , 2011, Cell.

[43]  Tiago J. S. Lopes,et al.  CTen: a web-based platform for identifying enriched cell types from heterogeneous microarray data , 2012, BMC Genomics.

[44]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[45]  A. Børresen-Dale,et al.  The Life History of 21 Breast Cancers , 2012, Cell.

[46]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[47]  T. Golub,et al.  Tumour micro-environment elicits innate resistance to RAF inhibitors through HGF secretion , 2012, Nature.

[48]  Steven J. M. Jones,et al.  Comprehensive molecular characterization of human colon and rectal cancer , 2012, Nature.

[49]  D. Dabbs,et al.  Interobserver agreement among pathologists for semiquantitative hormone receptor scoring in breast carcinoma. , 2012, American journal of clinical pathology.

[50]  C. Sautès-Fridman,et al.  The immune contexture in human tumours: impact on clinical outcome , 2012, Nature Reviews Cancer.

[51]  Matthew B. Callaway,et al.  MuSiC: Identifying mutational significance in cancer genomes , 2012, Genome research.

[52]  A. Tsao Ipilimumab in Combination With Paclitaxel and Carboplatin As First-Line Treatment in Stage IIIB/IV Non–Small-Cell Lung Cancer: Results From a Randomized, Double-Blind, Multicenter Phase II Study , 2012 .

[53]  Steven J. M. Jones,et al.  Comprehensive genomic characterization of squamous cell lung cancers , 2012, Nature.

[54]  A. McKenna,et al.  Absolute quantification of somatic DNA alterations in human cancer , 2012, Nature Biotechnology.

[55]  P. A. Futreal,et al.  Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. , 2012, The New England journal of medicine.

[56]  Kenichi Sugihara,et al.  Microarray Analysis of Colorectal Cancer Stromal Tissue Reveals Upregulation of Two Oncogenic miRNA Clusters , 2012, Clinical Cancer Research.

[57]  Tatsuhiko Tsunoda,et al.  High-Risk Ovarian Cancer Based on 126-Gene Expression Signature Is Uniquely Characterized by Downregulation of Antigen Presentation Pathway , 2012, Clinical Cancer Research.

[58]  Li Zhang,et al.  PurityEst: estimating purity of human tumor samples using next-generation sequencing data , 2012, Bioinform..

[59]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[60]  Angela N. Brooks,et al.  Mapping the Hallmarks of Lung Adenocarcinoma with Massively Parallel Sequencing , 2012, Cell.

[61]  Maria Keays,et al.  ArrayExpress update—trends in database growth and links to data analysis tools , 2012, Nucleic Acids Res..

[62]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[63]  K. Cibulskis,et al.  Prognostically relevant gene signatures of high-grade serous ovarian carcinoma. , 2012, The Journal of clinical investigation.

[64]  Steven J. M. Jones,et al.  Integrated genomic characterization of endometrial carcinoma , 2013, Nature.

[65]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .