CDSeq: A novel complete deconvolution method for dissecting heterogeneous samples using gene expression data

Quantifying cell-type proportions and their corresponding gene expression profiles in tissue samples would enhance understanding of the contributions of individual cell types to the physiological states of the tissue. Current approaches that address tissue heterogeneity have drawbacks. Experimental techniques, such as fluorescence-activated cell sorting, and single cell RNA sequencing are expensive. Computational approaches that use expression data from heterogeneous samples are promising, but most of the current methods estimate either cell-type proportions or cell-type-specific expression profiles by requiring the other as input. Although such partial deconvolution methods have been successfully applied to tumor samples, the additional input required may be unavailable. We introduce a novel complete deconvolution method, CDSeq, that uses only RNA-Seq data from bulk tissue samples to simultaneously estimate both cell-type proportions and cell-type-specific expression profiles. Using several synthetic and real experimental datasets with known cell-type composition and cell-type-specific expression profiles, we compared CDSeq’s complete deconvolution performance with seven other established deconvolution methods. Complete deconvolution using CDSeq represents a substantial technical advance over partial deconvolution approaches and will be useful for studying cell mixtures in tissue samples. CDSeq is available at GitHub repository (MATLAB and Octave code): https://github.com/kkang7/CDSeq.

[1]  Nancy R. Zhang,et al.  Bulk tissue cell type deconvolution with multi-subject single-cell expression reference , 2018, Nature Communications.

[2]  Gerald T. Quon,et al.  ISOLATE: a computational strategy for identifying the primary origin of cancers using high-throughput sequencing , 2009, Bioinform..

[3]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[4]  Yi Li,et al.  A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues , 2013, BMC Bioinformatics.

[5]  Boxi Kang,et al.  Landscape of Infiltrating T Cells in Liver Cancer Revealed by Single-Cell Sequencing , 2017, Cell.

[6]  Zhandong Liu,et al.  Gene expression deconvolution in linear space , 2011, Nature Methods.

[7]  X. Shirley Liu,et al.  Revisit linear regression-based deconvolution methods for tumor gene expression data , 2017, Genome Biology.

[8]  Mei Yu,et al.  PERT: A Method for Expression Deconvolution of Human Blood Samples from Varied Microenvironmental and Developmental Conditions , 2012, PLoS Comput. Biol..

[9]  Robert Clarke,et al.  UNDO: a Bioconductor R package for unsupervised deconvolution of mixed gene expressions in tumor samples , 2015, Bioinform..

[10]  Ash A. Alizadeh,et al.  Abstract PR09: The prognostic landscape of genes and infiltrating immune cells across human cancers , 2015 .

[11]  Ash A. Alizadeh,et al.  Robust enumeration of cell subsets from tissue expression profiles , 2015, Nature Methods.

[12]  Mauro Dell'Amico,et al.  Assignment Problems , 1998, IFIP Congress: Fundamentals - Foundations of Computer Science.

[13]  Aleksey A. Nakorchevskiy,et al.  Expression deconvolution: A reinterpretation of DNA microarray data reveals dynamic changes in cell populations , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[14]  W. Koh,et al.  Single-cell genome sequencing: current state of the science , 2016, Nature Reviews Genetics.

[15]  F. Marincola,et al.  Towards the introduction of the ‘Immunoscore’ in the classification of malignant tumours , 2013, The Journal of pathology.

[16]  Pornpimol Charoentong,et al.  Computational genomics tools for dissecting tumour–immune cell interactions , 2016, Nature Reviews Genetics.

[17]  Quaid Morris,et al.  Computational purification of individual tumor gene expression profiles leads to significant improvements in prognostic prediction , 2013, Genome Medicine.

[18]  Mark M. Davis,et al.  Cell type–specific gene expression differences in complex tissues , 2010, Nature Methods.

[19]  L. Pachter Models for transcript quantification from RNA-Seq , 2011, 1104.3889.

[20]  Francisco Avila Cobos,et al.  Computational deconvolution of transcriptomics data from mixed cell populations , 2018, Bioinform..

[21]  S. Shen-Orr,et al.  Computational deconvolution: extracting cell type-specific information from heterogeneous samples. , 2013, Current opinion in immunology.

[22]  Jun S. Liu,et al.  Comprehensive analyses of tumor immunity: implications for cancer immunotherapy , 2016, Genome Biology.

[23]  R. Faull,et al.  Population-specific expression analysis (PSEA) reveals molecular changes in diseased brain , 2011, Nature Methods.

[24]  Jürg Bähler,et al.  Coordinating genome expression with cell size. , 2012, Trends in genetics : TIG.

[25]  M. Ceccarelli,et al.  RNA-Seq Signatures Normalized by mRNA Abundance Allow Absolute Deconvolution of Human Immune Cell Types , 2019, Cell reports.

[26]  Pekka Ruusuvuori,et al.  Probabilistic analysis of gene expression measurements from heterogeneous tissues , 2010, Bioinform..

[27]  Joachim Selbig,et al.  Biomarker discovery in heterogeneous tissue samples -taking the in-silico deconfounding approach , 2010, BMC Bioinformatics.

[28]  Hugues Bersini,et al.  Separation of samples into their constituents using gene expression data , 2001, ISMB.

[29]  C. Seoighe,et al.  Semi-supervised Nonnegative Matrix Factorization for gene expression deconvolution: a case study. , 2012, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[30]  Camille Stephan-Otto Attolini,et al.  Stromal gene expression defines poor-prognosis subtypes in colorectal cancer , 2015, Nature Genetics.

[31]  Inge Jonassen,et al.  Deblender: a semi−/unsupervised multi-operational computational method for complete deconvolution of expression data from heterogeneous samples , 2018, BMC Bioinformatics.

[32]  Ting Gong,et al.  DeconRNASeq: a statistical framework for deconvolution of heterogeneous tissue samples based on mRNA-Seq data , 2013, Bioinform..

[33]  Z. Trajanoski,et al.  Integrative Analyses of Colorectal Cancer Show Immunoscore Is a Stronger Predictor of Patient Survival Than Microsatellite Instability. , 2016, Immunity.

[34]  Ash A. Alizadeh,et al.  Toward understanding and exploiting tumor heterogeneity , 2015, Nature Medicine.

[35]  Yi Zhong,et al.  Digital sorting of complex tissues for cell type-specific gene expression profiles , 2013, BMC Bioinformatics.

[36]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[37]  Sandrine Dudoit,et al.  Normalizing single-cell RNA sequencing data: challenges and opportunities , 2017, Nature Methods.