AutoGeneS: Automatic gene selection using multi-objective optimization for RNA-seq deconvolution.

Knowing cell-type proportions in a tissue is very important to identify which cells or cell types are targeted by a disease or perturbation. Hence, several deconvolution methods have been proposed to infer cell-type proportions from bulk RNA samples. Their performance with noisy reference profiles and closely correlated cell types highly depends on the set of genes undergoing deconvolution. In this work, we introduce AutoGeneS, a platform that automatically extracts discriminative genes and reveals the cellular heterogeneity of bulk RNA samples. AutoGeneS requires no prior knowledge about marker genes and selects genes by simultaneously optimizing multiple criteria: minimizing the correlation and maximizing the distance between cell types. AutoGeneS can be applied to reference profiles from various sources like single-cell experiments or sorted cell populations. Ground truth cell proportions analyzed by flow cytometry confirmed the accuracy of AutoGeneS in identifying cell-type proportions. AutoGeneS is available for use via a standalone Python package (https://github.com/theislab/AutoGeneS).

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[4]  Edda Klipp,et al.  Estimation of immune cell content in tumour tissue using single-cell RNA-seq data , 2017, Nature Communications.

[5]  S. Shen-Orr,et al.  Computational deconvolution: extracting cell type-specific information from heterogeneous samples. , 2013, Current opinion in immunology.

[6]  P. Laurent-Puig,et al.  Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression , 2016, Genome Biology.

[7]  Gregory J. Hunt,et al.  Dtangle: Accurate and Robust Cell Type Deconvolution , 2018, Bioinform..

[8]  Damaris Zurell,et al.  Collinearity: a review of methods to deal with it and a simulation study evaluating their performance , 2013 .

[9]  A. Singleton,et al.  Cell population-specific expression analysis of human cerebellum , 2012, BMC Genomics.

[10]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[11]  S. Quake,et al.  Single-Cell Analysis of Human Pancreas Reveals Transcriptional Signatures of Aging and Somatic Mutation Patterns , 2017, Cell.

[12]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[13]  Rose Du,et al.  deconvSeq: deconvolution of cell mixture distribution in sequencing data , 2019, Bioinform..

[14]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[15]  Fabian J Theis,et al.  SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[16]  M. Ceccarelli,et al.  RNA-Seq Signatures Normalized by mRNA Abundance Allow Absolute Deconvolution of Human Immune Cell Types , 2019, Cell reports.

[17]  David W. Coit,et al.  Multi-objective optimization using genetic algorithms: A tutorial , 2006, Reliab. Eng. Syst. Saf..

[18]  J. Whitsett,et al.  Respiratory epithelial cells orchestrate pulmonary innate immunity , 2014, Nature Immunology.

[19]  Kalyanmoy Deb,et al.  Multi-objective Optimisation Using Evolutionary Algorithms: An Introduction , 2011, Multi-objective Evolutionary Optimisation for Product Design and Manufacturing.

[20]  D. M. Smith,et al.  Single-Cell Transcriptome Profiling of Human Pancreatic Islets in Health and Type 2 Diabetes , 2016, Cell metabolism.

[21]  Marc Parizeau,et al.  DEAP: a python framework for evolutionary algorithms , 2012, GECCO '12.

[22]  Alex K. Shalek,et al.  Allergic inflammatory memory in human respiratory epithelial progenitor cells , 2018, Nature.

[23]  Paul J. Hoffman,et al.  Comprehensive Integration of Single-Cell Data , 2018, Cell.

[24]  Mark M. Davis,et al.  Cell type–specific gene expression differences in complex tissues , 2010, Nature Methods.

[25]  Maxim N. Artyomov,et al.  Complete deconvolution of cellular mixtures based on linearity of transcriptional signatures , 2019, Nature Communications.

[26]  Loss-Function Learning for Digital Tissue Deconvolution. , 2020, Journal of computational biology : a journal of computational molecular cell biology.

[27]  A. Raj,et al.  Single mammalian cells compensate for differences in cellular volume and DNA copy number through independent global transcriptional mechanisms. , 2015, Molecular cell.

[28]  Ash A. Alizadeh,et al.  Determining cell-type abundance and expression from bulk tissues with digital cytometry , 2019, Nature Biotechnology.

[29]  Alice E. Smith,et al.  Multi-objective optimization using evolutionary algorithms [Book Review] , 2002, IEEE Transactions on Evolutionary Computation.

[30]  Francesco Vallania,et al.  Leveraging heterogeneity across multiple datasets increases cell-mixture deconvolution accuracy and reduces biological and technical biases , 2018, Nature Communications.

[31]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[32]  Aleksandra A. Kolodziejczyk,et al.  Accounting for technical noise in single-cell RNA-seq experiments , 2013, Nature Methods.

[33]  A. Butte,et al.  xCell: digitally portraying the tissue cellular heterogeneity landscape , 2017, Genome Biology.

[34]  Xiaoling Li,et al.  A novel computational complete deconvolution method using RNA-seq data , 2018 .

[35]  A. Iwasaki,et al.  Early local immune defences in the respiratory tract , 2016, Nature Reviews Immunology.

[36]  Dan Boneh,et al.  On genetic algorithms , 1995, COLT '95.

[37]  Russell Schwartz,et al.  Applying unmixing to gene expression data for tumor phylogeny inference , 2010, BMC Bioinformatics.

[38]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[39]  R. Stewart,et al.  Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm , 2016, Genome Biology.

[40]  Nancy R. Zhang,et al.  Bulk tissue cell type deconvolution with multi-subject single-cell expression reference , 2018, Nature Communications.

[41]  Kok Siong Ang,et al.  A benchmark of batch-effect correction methods for single-cell RNA sequencing data , 2020, Genome Biology.

[42]  Samuel L. Wolock,et al.  A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure. , 2016, Cell systems.

[43]  Ash A. Alizadeh,et al.  Robust enumeration of cell subsets from tissue expression profiles , 2015, Nature Methods.

[44]  Eran Bacharach,et al.  Cell composition analysis of bulk genomics using single cell data , 2019, Nature Methods.

[45]  Chih-Jen Lin,et al.  A tutorial on?-support vector machines , 2005 .

[46]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[47]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[48]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[49]  Roland Eils,et al.  Complex heatmaps reveal patterns and correlations in multidimensional genomic data , 2016, Bioinform..

[50]  R. Faull,et al.  Population-specific expression analysis (PSEA) reveals molecular changes in diseased brain , 2011, Nature Methods.

[51]  Sarah A Teichmann,et al.  A test metric for assessing single-cell RNA-seq batch correction , 2018, Nature Methods.

[52]  Dan Zhang,et al.  Construction of a human cell landscape at single-cell level , 2020, Nature.