An elastic-net logistic regression approach to generate classifiers and gene signatures for types of immune cells and T helper cell subsets

Background Host immune response is coordinated by a variety of different specialized cell types that vary in time and location. While host immune response can be studied using conventional low-dimensional approaches, advances in transcriptomics analysis may provide a less biased view. Yet, leveraging transcriptomics data to identify immune cell subtypes presents challenges for extracting informative gene signatures hidden within a high dimensional transcriptomics space characterized by low sample numbers with noisy and missing values. To address these challenges, we explore using machine learning methods to select gene subsets and estimate gene coefficients simultaneously. Results Elastic-net logistic regression, a type of machine learning, was used to construct separate classifiers for ten different types of immune cell and for five T helper cell subsets. The resulting classifiers were then used to develop gene signatures that best discriminate among immune cell types and T helper cell subsets using RNA-seq datasets. We validated the approach using single-cell RNA-seq (scRNA-seq) datasets, which gave consistent results. In addition, we classified cell types that were previously unannotated. Finally, we benchmarked the proposed gene signatures against other existing gene signatures. Conclusions Developed classifiers can be used as priors in predicting the extent and functional orientation of the host immune response in diseases, such as cancer, where transcriptomic profiling of bulk tissue samples and single cells are routinely employed. Information that can provide insight into the mechanistic basis of disease and therapeutic response. The source code and documentation are available through GitHub: https://github.com/KlinkeLab/ImmClass2019.

[1]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[2]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[3]  Malika Charrad,et al.  NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set , 2014 .

[4]  Ash A. Alizadeh,et al.  Determining cell-type abundance and expression from bulk tissues with digital cytometry , 2019, Nature Biotechnology.

[5]  Concha Bielza,et al.  Regularized logistic regression without a penalty term: An application to cancer classification with microarray data , 2011, Expert Syst. Appl..

[6]  Ash A. Alizadeh,et al.  Robust enumeration of cell subsets from tissue expression profiles , 2015, Nature Methods.

[7]  Kwong-Sak Leung,et al.  Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification , 2013, BMC Bioinformatics.

[8]  Andrew D. Rouillard,et al.  Enrichr: a comprehensive gene set enrichment analysis web server 2016 update , 2016, Nucleic Acids Res..

[9]  G. Trinchieri,et al.  Human natural killer cells. , 1979, Transplantation proceedings.

[10]  J. Wargo,et al.  Primary, Adaptive, and Acquired Resistance to Cancer Immunotherapy , 2017, Cell.

[11]  Joonsoo Kang,et al.  Immunological Genome Project and systems immunology. , 2013, Trends in immunology.

[12]  Charles H. Yoon,et al.  Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq , 2016, Science.

[13]  Jeong Eon Lee,et al.  Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer , 2017, Nature Communications.

[14]  Gavin C. Cawley,et al.  Gene Selection in Cancer Classification using Sparse Logistic Regression with Bayesian Regularisation , 2006 .

[15]  L. J. K. Wee,et al.  A machine learning approach for the identification of key markers involved in brain development from single-cell transcriptomic data , 2016, BMC Genomics.

[16]  Gerhard Laschober,et al.  quanTIseq: quantifying immune contexture of human tumors , 2017 .

[17]  Hans Clevers,et al.  Single-cell messenger RNA sequencing reveals rare intestinal cell types , 2015, Nature.

[18]  Petra C. Schwalie,et al.  Cross‐Tissue Identification of Somatic Stem and Progenitor Cells Using a Single‐Cell RNA‐Sequencing Derived Gene Signature , 2017, Stem cells.

[19]  Z. Trajanoski,et al.  Quantifying tumor-infiltrating immune cells from transcriptomics data , 2018, Cancer Immunology, Immunotherapy.

[20]  D. Speiser,et al.  Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data , 2017, bioRxiv.

[21]  M. Haniffa,et al.  The impact of single-cell RNA sequencing on understanding the functional organization of the immune system , 2018, Briefings in functional genomics.

[22]  Sean C. Bendall,et al.  Single-Cell Mass Cytometry of Differential Immune and Drug Responses Across a Human Hematopoietic Continuum , 2011, Science.

[23]  Ludmila V. Danilova,et al.  Tumor immune microenvironment characterization in clear cell renal cell carcinoma identifies prognostic and immunotherapeutically relevant messenger RNA signatures , 2016, Genome Biology.

[24]  Yan Guo,et al.  A Cell-Based Systems Biology Assessment of Human Blood to Monitor Immune Responses after Influenza Vaccination , 2015, PloS one.

[25]  Peter S. Linsley,et al.  Copy Number Loss of the Interferon Gene Cluster in Melanomas Is Linked to Reduced T Cell Infiltrate and Poor Patient Prognosis , 2014, PloS one.

[26]  Helga Thorvaldsdóttir,et al.  Molecular signatures database (MSigDB) 3.0 , 2011, Bioinform..

[27]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[28]  M. Mallmann,et al.  High-Resolution Transcriptome of Human Macrophages , 2012, PloS one.

[29]  S. Ziegler,et al.  CD25+CD4+ Regulatory T Cells from the Peripheral Blood of Asymptomatic HIV-infected Individuals Regulate CD4+ and CD8+ HIV-specific T Cell Immune Responses In Vitro and Are Associated with Favorable Clinical Markers of Disease Status , 2004, The Journal of experimental medicine.

[30]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[31]  D. Klinke,et al.  Identifying causal networks linking cancer processes and anti‐tumor immunity using Bayesian network inference and metagene constructs , 2016, Biotechnology progress.

[32]  YanHong,et al.  Biomarker Identification and Cancer Classification Based on Microarray Data Using Laplace Naive Bayes Model with Mean Shrinkage , 2012 .

[33]  Hong Yan,et al.  Biomarker Identification and Cancer Classification Based on Microarray Data Using Laplace Naive Bayes Model with Mean Shrinkage , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[34]  Monika S. Kowalczyk,et al.  A Cancer Cell Program Promotes T Cell Exclusion and Resistance to Checkpoint Blockade , 2018, Cell.

[35]  Howard Y. Chang,et al.  Lineage-specific and single cell chromatin accessibility charts human hematopoiesis and leukemia evolution , 2016, Nature Genetics.

[36]  Ying Liu,et al.  Functional analysis and transcriptomic profiling of iPSC-derived macrophages and their application in modeling Mendelian disease. , 2015, Circulation research.

[37]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[38]  W. Du,et al.  Identification of Gene-Expression Signatures and Protein Markers for Breast Cancer Grading and Staging , 2015, PloS one.

[39]  Avi Ma'ayan,et al.  Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool , 2013, BMC Bioinformatics.

[40]  Sandrine Dudoit,et al.  GC-Content Normalization for RNA-Seq Data , 2011, BMC Bioinformatics.

[41]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[42]  James T. Wassell,et al.  Bootstrap Methods: A Practitioner's Guide , 2001, Technometrics.

[43]  Jun S. Liu,et al.  Comprehensive analyses of tumor immunity: implications for cancer immunotherapy , 2016, Genome Biology.

[44]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[45]  H. Lähdesmäki,et al.  Identification of global regulators of T-helper cell lineage specification , 2015, Genome Medicine.

[46]  Chen Xu,et al.  Identification of cell types from single-cell transcriptomes using a novel clustering method , 2015, Bioinform..

[47]  T. Aune,et al.  Expression and functions of long noncoding RNAs during human T helper cell differentiation , 2015, Nature Communications.

[48]  Santiago J. Carmona,et al.  Single-cell transcriptome analysis of fish immune cells provides insight into the evolution of vertebrate immune cell types , 2017, Genome research.

[49]  Jian Yang,et al.  Sparse maximum margin discriminant analysis for feature extraction and gene selection on gene expression data , 2013, Comput. Biol. Medicine.

[50]  Santiago J. Carmona,et al.  Single-cell transcriptome analysis of fish immune cells provides insight into the evolution of vertebrate immunity , 2016, bioRxiv.

[51]  E. Schadt,et al.  Unifying immunology with informatics and multiscale biology , 2014, Nature Immunology.

[52]  A. Oshlack,et al.  Transcript length bias in RNA-seq data confounds systems biology , 2009, Biology Direct.

[53]  C CawleyGavin,et al.  Gene selection in cancer classification using sparse logistic regression with Bayesian regularization , 2006 .

[54]  Yanwen Chong,et al.  Gene selection using independent variable group analysis for tumor classification , 2011, Neural Computing and Applications.

[55]  Muhammad Hisyam Lee,et al.  Regularized logistic regression with adjusted adaptive elastic net for gene selection in high dimensional cancer classification , 2015, Comput. Biol. Medicine.

[56]  Aníbal R. Figueiras-Vidal,et al.  Pattern classification with missing data: a review , 2010, Neural Computing and Applications.

[57]  S. Lewin,et al.  The role of antigen presenting cells in the induction of HIV-1 latency in resting CD4+ T-cells , 2015, Retrovirology.

[58]  H. Lähdesmäki,et al.  Time-resolved transcriptome and proteome landscape of human regulatory T cell (Treg) differentiation reveals novel regulators of FOXP3 , 2018, BMC Biology.