DeMix: deconvolution for mixed cancer transcriptomes using raw measured data

MOTIVATION Tissue samples of tumor cells mixed with stromal cells cause underdetection of gene expression signatures associated with cancer prognosis or response to treatment. In silico dissection of mixed cell samples is essential for analyzing expression data generated in cancer studies. Currently, a systematic approach is lacking to address three challenges in computational deconvolution: (i) violation of linear addition of expression levels from multiple tissues when log-transformed microarray data are used; (ii) estimation of both tumor proportion and tumor-specific expression, when neither is known a priori; and (iii) estimation of expression profiles for individual patients. RESULTS We have developed a statistical method for deconvolving mixed cancer transcriptomes, DeMix, which addresses the aforementioned issues in array-based expression data. We demonstrate the performance of our model in synthetic and real, publicly available, datasets. DeMix can be applied to ongoing biomarker-based clinical studies and to the vast expression datasets previously generated from mixed tumor and stromal cell samples. AVAILABILITY All codes are written in C and integrated into an R function, which is available at http://odin.mdacc.tmc.edu/∼wwang7/DeMix.html. CONTACT wwang7@mdanderson.org SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[2]  Adam Kowalczyk,et al.  An expression-based site of origin diagnostic method designed for clinical application to cancer of unknown origin. , 2005, Cancer research.

[3]  Mark M. Gosink,et al.  Electronically subtracting expression patterns from a mixed cell population , 2007, Bioinform..

[4]  Pekka Ruusuvuori,et al.  Probabilistic analysis of gene expression measurements from heterogeneous tissues , 2010, Bioinform..

[5]  Hugues Bersini,et al.  Separation of samples into their constituents using gene expression data , 2001, ISMB.

[6]  Min Wang,et al.  Computational expression deconvolution in a complex mammalian organ , 2006, BMC Bioinformatics.

[7]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[8]  Hanlee P. Ji,et al.  The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. , 2006, Nature biotechnology.

[9]  J. Wang-Rodriguez,et al.  In silico dissection of cell-type-associated patterns of gene expression in prostate cancer. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[11]  Zhandong Liu,et al.  Gene expression deconvolution in linear space , 2011, Nature Methods.

[12]  References , 1971 .

[13]  Mei Yu,et al.  PERT: A Method for Expression Deconvolution of Human Blood Samples from Varied Microenvironmental and Developmental Conditions , 2012, PLoS Comput. Biol..

[14]  Aleksey A. Nakorchevskiy,et al.  Expression deconvolution: A reinterpretation of DNA microarray data reveals dynamic changes in cell populations , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[16]  J Jack Lee,et al.  Clinical Outcomes and Biomarker Profiles of Elderly Pretreated NSCLC Patients from the BATTLE Trial , 2012, Journal of thoracic oncology : official publication of the International Association for the Study of Lung Cancer.

[17]  Mark M. Davis,et al.  Cell type–specific gene expression differences in complex tissues , 2010, Nature Methods.

[18]  Z. Modrušan,et al.  Deconvolution of Blood Microarray Data Identifies Cellular Activation Patterns in Systemic Lupus Erythematosus , 2009, PloS one.

[19]  Debashis Ghosh,et al.  Mixture models for assessing differential expression in complex tissues using microarray data , 2004, Bioinform..

[20]  Oliver Sieber,et al.  A statistical approach for detecting genomic aberrations in heterogeneous tumor samples from single nucleotide polymorphism genotyping data , 2010, Genome Biology.

[21]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[22]  Jennifer Clarke,et al.  Statistical expression deconvolution from mixed tissue samples , 2010, Bioinform..

[23]  E. Petricoin,et al.  Laser Capture Microdissection , 1996, Science.

[24]  Rafael A Irizarry,et al.  Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. , 2006, Biostatistics.