CellMix: a comprehensive toolbox for gene expression deconvolution

UNLABELLED Gene expression data are typically generated from heterogeneous biological samples that are composed of multiple cell or tissue types, in varying proportions, each contributing to global gene expression. This heterogeneity is a major confounder in standard analysis such as differential expression analysis, where differences in the relative proportions of the constituent cells may prevent or bias the detection of cell-specific differences. Computational deconvolution of global gene expression is an appealing alternative to costly physical sample separation techniques and enables a more detailed analysis of the underlying biological processes at the cell-type level. To facilitate and popularize the application of such methods, we developed CellMix, an R package that incorporates most state-of-the-art deconvolution methods, into an intuitive and extendible framework, providing a single entry point to explore, assess and disentangle gene expression data from heterogeneous samples. AVAILABILITY AND IMPLEMENTATION The CellMix package builds on R/BioConductor and is available from http://web.cbio.uct.ac.za/∼renaud/CRAN/web/CellMix. It is currently being submitted to BioConductor. The package's vignettes notably contain additional information, examples and references.

[1]  Renaud Gaujoux,et al.  A flexible R package for nonnegative matrix factorization , 2010, BMC Bioinformatics.

[2]  D. Isenberg,et al.  Systemic lupus erythematosus. , 2008, The New England journal of medicine.

[3]  Mark M. Davis,et al.  Cell type–specific gene expression differences in complex tissues , 2010, Nature Methods.

[4]  L. Pasquier,et al.  Orphanet Journal of Rare Diseases , 2006 .

[5]  S. Teichmann,et al.  A HaemAtlas: characterizing gene expression in differentiated human blood cells , 2008, Blood.

[6]  C. Seoighe,et al.  Semi-supervised Nonnegative Matrix Factorization for gene expression deconvolution: a case study. , 2012, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[7]  Steven H. Kleinstein,et al.  Cell subset prediction for blood genomic studies , 2012 .

[8]  Seungjin Choi,et al.  Semi-Supervised Nonnegative Matrix Factorization , 2010, IEEE Signal Processing Letters.

[9]  Pekka Ruusuvuori,et al.  Probabilistic analysis of gene expression measurements from heterogeneous tissues , 2010, Bioinform..

[10]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[11]  Yingdong Zhao,et al.  Gene expression deconvolution in clinical samples , 2010, Genome Medicine.

[12]  Z. Modrušan,et al.  Deconvolution of Blood Microarray Data Identifies Cellular Activation Patterns in Systemic Lupus Erythematosus , 2009, PloS one.

[13]  R. Faull,et al.  Population-specific expression analysis (PSEA) reveals molecular changes in diseased brain , 2011, Nature Methods.