FEATS: feature selection-based clustering of single-cell RNA-seq data.

MOTIVATION Advances in next-generation sequencing have made it possible to carry out transcriptomic studies at single-cell resolution and generate vast amounts of single-cell RNA sequencing (RNA-seq) data rapidly. Thus, tools to analyze this data need to evolve as well as to improve accuracy and efficiency. RESULTS We present FEATS, a Python software package, that performs clustering on single-cell RNA-seq data. FEATS is capable of performing multiple tasks such as estimating the number of clusters, conducting outlier detection and integrating data from various experiments. We develop a univariate feature selection-based approach for clustering, which involves the selection of top informative features to improve clustering performance. This is motivated by the fact that cell types are often manually determined using the expression of only a few known marker genes. On a variety of single-cell RNA-seq datasets, FEATS gives superior performance compared with the current tools, in terms of adjusted Rand index and estimating the number of clusters. It achieves a 22% improvement in clustering and more accurately estimates the number of clusters when compared with other tools. In addition to cluster estimation, FEATS also performs outlier detection and data integration while giving an excellent computational performance. Thus, FEATS is a comprehensive clustering tool capable of addressing the challenges during the clustering of single-cell RNA-seq data. AVAILABILITY The installation instructions and documentation of FEATS is available at https://edwinv87.github.io/feats/. SUPPLEMENTARY DATA Supplementary data are available online at https://academic.oup.com/bib.

[1]  B. Tjaden,et al.  De novo assembly of bacterial transcriptomes from RNA-seq data , 2015, Genome Biology.

[2]  J. Marioni,et al.  Heterogeneity in Oct4 and Sox2 Targets Biases Cell Fate in 4-Cell Mouse Embryos , 2016, Cell.

[3]  Tal Nawy,et al.  How single cells do it , 2016, Nature Methods.

[4]  Hongkai Ji,et al.  TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis , 2016, Nucleic acids research.

[5]  Jeffrey M. Perkel,et al.  Single-cell sequencing made simple , 2017, Nature.

[6]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[7]  R. Sandberg,et al.  Single-Cell RNA-Seq Reveals Dynamic, Random Monoallelic Gene Expression in Mammalian Cells , 2014, Science.

[8]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[9]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[10]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[11]  Ruiqiang Li,et al.  Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells , 2013, Nature Structural &Molecular Biology.

[12]  F. Biase,et al.  Cell fate inclination within 2-cell and 4-cell mouse embryos revealed by single-cell RNA sequencing , 2014, Genome research.

[13]  Laleh Haghverdi,et al.  Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors , 2018, Nature Biotechnology.

[14]  Bo Wang,et al.  Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning , 2016, Nature Methods.

[15]  Aleksandra A. Kolodziejczyk,et al.  Single Cell RNA-Sequencing of Pluripotent States Unlocks Modular Transcriptional Variation , 2015, Cell stem cell.

[16]  Hongshan Guo,et al.  Single-cell RNA-seq transcriptome analysis of linear and circular RNAs in mouse preimplantation embryos , 2015, Genome Biology.

[17]  Shawn M. Gillespie,et al.  Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma , 2014, Science.

[18]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[19]  Hui Wang,et al.  SINCERA: A Pipeline for Single-Cell RNA-Seq Profiling Analysis , 2015, PLoS Comput. Biol..

[20]  H. Clevers,et al.  Single Lgr5 stem cells build crypt–villus structures in vitro without a mesenchymal niche , 2009, Nature.

[21]  N. Neff,et al.  Reconstructing lineage hierarchies of the distal lung epithelium using single cell RNA-seq , 2014, Nature.

[22]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[23]  Alex A. Pollen,et al.  Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex , 2014, Nature Biotechnology.

[24]  Chun-Nan Hsu,et al.  Weakly supervised learning of biomedical information extraction from curated data , 2016, BMC Bioinformatics.

[25]  S. Linnarsson,et al.  Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. , 2011, Genome research.

[26]  Tatsuhiko Tsunoda,et al.  DeepInsight: A methodology to transform a non-image data to an image for convolution neural network architecture , 2019, Scientific Reports.

[27]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[28]  A. Regev,et al.  Spatial reconstruction of single-cell gene expression , 2015, Nature Biotechnology.

[29]  Kumari Snehkant Lata,et al.  Inferring pathogen-host interactions between Leptospira interrogans and Homo sapiens using network theory , 2019, Scientific Reports.

[30]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[31]  Daichi Shigemizu,et al.  Clustering of Small-Sample Single-Cell RNA-Seq Data via Feature Clustering and Selection , 2019, PRICAI.

[32]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[33]  Hans Clevers,et al.  Single-cell messenger RNA sequencing reveals rare intestinal cell types , 2015, Nature.

[34]  Bonnie Berger,et al.  Efficient integration of heterogeneous single-cell transcriptomes using Scanorama , 2019, Nature Biotechnology.

[35]  Christopher Yau,et al.  pcaReduce: hierarchical clustering of single cell transcriptional profiles , 2015, BMC Bioinformatics.