Comparative Single-cell RNA-sequencing Cluster Analysis for Traumatic Brain Injury Marker Genes Detection

Single-cell RNA-sequencing (scRNA-seq) is a high-resolution transcriptomic approach used to discover gene expression patterns among cell types to study precise biological functions. Unsupervised machine learning (clustering) is of central importance for the analysis of scRNA- seq data. It can identify putative cell types, uncover regulatory relationships, and track cell lineages and trajectories. A key issue in clustering scRNA-seq data is determining which clus- tering method is appropriate to use, since varied methods can yield diverse results. Current approaches usually focus on a one method and manually select a seemingly meaningful result. From a biological relevance perspective, it is vital to distinguish between normal and pathogenic cell types using marker genes. We present a learning framework for comparing outcomes of multiple scRNA-seq clustering methods to determine the most optimal results. We address the challenges of model selection and validation metrics in the context of traumatic brain injury (TBI) applications. We compare clustering performance of five clustering algorithms and two dimensionality reduction techniques implemented in both Seurat and Scanpy packages.