Data Analysis in Single-Cell Transcriptome Sequencing.

Single-cell transcriptome sequencing, often referred to as single-cell RNA sequencing (scRNA-seq), is used to measure gene expression at the single-cell level and provides a higher resolution of cellular differences than bulk RNA-seq. With more detailed and accurate information, scRNA-seq will greatly promote the understanding of cell functions, disease progression, and treatment response. Although the scRNA-seq experimental protocols have been improved very quickly, many challenges in the scRNA-seq data analysis still need to be overcome. In this chapter, we focus on the introduction and discussion of the research status in the field of scRNA-seq data normalization and cluster analysis, which are the two most important challenges in the scRNA-seq data analysis. Particularly, we present a protocol to discover and validate cancer stem cells (CSCs) using scRNA-seq. Suggestions have also been made to help researchers rationally design their scRNA-seq experiments and data analysis in their future studies.

[1]  Anil K. Jain,et al.  An Intrinsic Dimensionality Estimator from Near-Neighbor Information , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  K. O'Byrne,et al.  The cancer stem-cell hypothesis: its emerging role in lung cancer biology and its relevance for future therapy. , 2012, Journal of thoracic oncology : official publication of the International Association for the Study of Lung Cancer.

[3]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[4]  M. Salit,et al.  Synthetic Spike-in Standards for Rna-seq Experiments Material Supplemental Open Access License Commons Creative , 2022 .

[5]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[6]  Yijuan Zhang,et al.  Do Housekeeping Genes Exist? , 2015, PloS one.

[7]  Shan Gao,et al.  Fastq_clean: An optimized pipeline to clean the Illumina sequencing data with quality control , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[8]  S. Dudoit,et al.  Normalization of RNA-seq data using factor analysis of control genes or samples , 2014, Nature Biotechnology.

[9]  Alfred O. Hero,et al.  Geodesic entropic graphs for dimension and entropy estimation in manifold learning , 2004, IEEE Transactions on Signal Processing.

[10]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[11]  Gioele La Manno,et al.  Quantitative single-cell RNA-seq with unique molecular identifiers , 2013, Nature Methods.

[12]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[13]  I. Hellmann,et al.  Comparative Analysis of Single-Cell RNA Sequencing Methods , 2016, bioRxiv.

[14]  David A. Orlando,et al.  Revisiting Global Gene Expression Analysis , 2012, Cell.

[15]  Peter J. Bickel,et al.  Maximum Likelihood Estimation of Intrinsic Dimension , 2004, NIPS.

[16]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[17]  Balázs Kégl,et al.  Intrinsic Dimension Estimation Using Packing Numbers , 2002, NIPS.

[18]  W. Bu,et al.  Two novel lncRNAs discovered in human mitochondrial DNA using PacBio full-length transcriptome data , 2016, bioRxiv.

[19]  A. Regev,et al.  Spatial reconstruction of single-cell gene expression , 2015, Nature Biotechnology.

[20]  J. Marioni,et al.  Pooling across cells to normalize single-cell RNA sequencing data with many zero counts , 2016, Genome Biology.

[21]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[22]  John C. Marioni,et al.  Identifying Cell Types from Spatially Referenced Single-Cell Expression Datasets , 2014, PLoS Comput. Biol..

[23]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[24]  Aleksandra A. Kolodziejczyk,et al.  Classification of low quality cells from single-cell RNA-seq data , 2016, Genome Biology.