AutoGeneS: Automatic gene selection using multi-objective optimization for RNA-seq deconvolution

Tissues are complex systems of interacting cell types. Knowing cell-type proportions in a tissue is very important to identify which cells or cell types are targeted by a disease or perturbation. When measuring such responses using RNA-seq, bulk RNA-seq masks cellular heterogeneity. Hence, several computational methods have been proposed to infer cell-type proportions from bulk RNA samples. Their performance with noisy reference profiles highly depends on the set of genes undergoing deconvolution. These genes are often selected based on prior knowledge or a single-criterion test that might not be useful to dissect closely correlated cell types. In this work, we introduce AutoGeneS, a tool that automatically extracts informative genes and reveals the cellular heterogeneity of bulk RNA samples. AutoGeneS requires no prior knowledge about marker genes and selects genes by simultaneously optimizing multiple criteria: minimizing the correlation and maximizing the distance between cell types. It can be applied to reference profiles from various sources like single-cell experiments or sorted cell populations. Results from human samples of peripheral blood illustrate that AutoGeneS outperforms other methods. Our results also highlight the impact of our approach on analyzing bulk RNA samples with noisy single-cell reference profiles and closely correlated cell types. Ground truth cell proportions analyzed by flow cytometry confirmed the accuracy of the predictions of AutoGeneS in identifying cell-type proportions. AutoGeneS is available for use via a standalone Python package (https://github.com/theislab/AutoGeneS).

[1]  Rainer Spang,et al.  Loss-Function Learning for Digital Tissue Deconvolution , 2018, RECOMB.

[2]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[3]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[4]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[5]  Kalyanmoy Deb,et al.  Multi-objective Optimisation Using Evolutionary Algorithms: An Introduction , 2011, Multi-objective Evolutionary Optimisation for Product Design and Manufacturing.

[6]  Nancy R. Zhang,et al.  Bulk tissue cell type deconvolution with multi-subject single-cell expression reference , 2018, Nature Communications.

[7]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[8]  Christoph Hafemeister,et al.  Comprehensive integration of single cell data , 2018, bioRxiv.

[9]  Edda Klipp,et al.  Estimation of immune cell content in tumour tissue using single-cell RNA-seq data , 2017, Nature Communications.

[10]  Damaris Zurell,et al.  Collinearity: a review of methods to deal with it and a simulation study evaluating their performance , 2013 .

[11]  David W. Coit,et al.  Multi-objective optimization using genetic algorithms: A tutorial , 2006, Reliab. Eng. Syst. Saf..

[12]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[13]  Ash A. Alizadeh,et al.  Determining cell-type abundance and expression from bulk tissues with digital cytometry , 2019, Nature Biotechnology.

[14]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[15]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[16]  A. Butte,et al.  xCell: digitally portraying the tissue cellular heterogeneity landscape , 2017, Genome Biology.

[17]  Ash A. Alizadeh,et al.  Robust enumeration of cell subsets from tissue expression profiles , 2015, Nature Methods.

[18]  A. Singleton,et al.  Cell population-specific expression analysis of human cerebellum , 2012, BMC Genomics.

[19]  Bernhard Sch,et al.  A tutorial on n-support vector machines , 2005 .

[20]  Chih-Jen Lin,et al.  A tutorial on?-support vector machines , 2005 .

[21]  Xiaoling Li,et al.  A novel computational complete deconvolution method using RNA-seq data , 2018 .

[22]  Russell Schwartz,et al.  Applying unmixing to gene expression data for tumor phylogeny inference , 2010, BMC Bioinformatics.

[23]  Maxim N. Artyomov,et al.  Complete deconvolution of cellular mixtures based on linearity of transcriptional signatures , 2019, Nature Communications.

[24]  Gregory J. Hunt,et al.  Dtangle: Accurate and Robust Cell Type Deconvolution , 2018, Bioinform..

[25]  Mark M. Davis,et al.  Cell type–specific gene expression differences in complex tissues , 2010, Nature Methods.

[26]  Fabian J Theis,et al.  SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[27]  Paul J. Hoffman,et al.  Comprehensive Integration of Single-Cell Data , 2018, Cell.

[28]  Aleksandra A. Kolodziejczyk,et al.  Accounting for technical noise in single-cell RNA-seq experiments , 2013, Nature Methods.

[29]  M. Ceccarelli,et al.  RNA-Seq Signatures Normalized by mRNA Abundance Allow Absolute Deconvolution of Human Immune Cell Types , 2019, Cell reports.

[30]  Dan Boneh,et al.  On genetic algorithms , 1995, COLT '95.

[31]  Samuel L. Wolock,et al.  A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure. , 2016, Cell systems.

[32]  R. Faull,et al.  Population-specific expression analysis (PSEA) reveals molecular changes in diseased brain , 2011, Nature Methods.

[33]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[34]  S. Shen-Orr,et al.  Computational deconvolution: extracting cell type-specific information from heterogeneous samples. , 2013, Current opinion in immunology.

[35]  Eran Bacharach,et al.  Cell composition analysis of bulk genomics using single cell data , 2019, Nature Methods.

[36]  Rose Du,et al.  deconvSeq: deconvolution of cell mixture distribution in sequencing data , 2019, Bioinform..

[37]  Alice E. Smith,et al.  Multi-objective optimization using evolutionary algorithms [Book Review] , 2002, IEEE Transactions on Evolutionary Computation.

[38]  R. Stewart,et al.  Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm , 2016, Genome Biology.