Combining Pathway Analysis and Supervised Machine Learning for the Functional Classification of Single-Cell Transcriptomic Data

The revolution of single-cell technologies established a novel framework to investigate gene expression profiles in the level of individual cells. Scientists are able to investigate the biological variability of the same tissue, producing isolated transcriptomic data for each single cell. As a result, each transcriptomic experiment could extract a unique expression profile for each cell, posing new challenges in the translation analysis of all these profiles. Pathway analysis tools need to be adapted, not only to analyze simultaneously numerous gene expression profiles, but also to compare them, detecting functional differences and commonalities among the cells of the same issue, separating them to functional subclusters. In this study, we used the output of a single-cell experiment in the hematopoietic system, in order to determine a novel framework for the functional comparison of single cells, based on their pathway analysis with Gene Ontology annotation. Thousands of expression profiles of single cells, congregated in 15 different hematopoietic classes, were translated into networks of significant biological mechanisms, through the use of BioInfoMiner platform. We propose a novel framework to exploit these results and construct appropriate feature spaces of functional omponents, with a view to perform supervised learning to different hematopoietic cell types and separate their respective single cells, according to their functional profile. The constructed classification model performed interestingly high precision and sensitivity scores for some cell types, while the overall performance needs to be improved with further conceptual and technical refinements.

[1]  The Gene Ontology Consortium Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[2]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[3]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[4]  Anthony M. Zador,et al.  Cellular barcoding: lineage tracing, screening and beyond , 2018, Nature Methods.

[5]  A. Lusis,et al.  Considerations for the design of omics studies , 2017 .

[6]  Aristotelis A. Chatziioannou,et al.  Analyzing and Visualizing Genomic Complexity for the Derivation of the Emergent Molecular Networks , 2016, Int. J. Monit. Surveillance Technol. Res..

[7]  G. Pinkus,et al.  Single-Cell RNA-Seq Reveals AML Hierarchies Relevant to Disease Progression and Immunity , 2019, Cell.

[8]  L. Steinmetz,et al.  Human haematopoietic stem cell lineage commitment is a continuous process , 2017, Nature Cell Biology.

[9]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[10]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[11]  F. Tang,et al.  Development and applications of single-cell transcriptome analysis , 2011, Nature Methods.

[12]  Gary D Bader,et al.  Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap , 2019, Nature Protocols.

[13]  C. Nerlov,et al.  Haematopoiesis in the era of advanced single-cell technologies , 2019, Nature Cell Biology.