pcaReduce: hierarchical clustering of single cell transcriptional profiles

BackgroundAdvances in single cell genomics provide a way of routinely generating transcriptomics data at the single cell level. A frequent requirement of single cell expression analysis is the identification of novel patterns of heterogeneity across single cells that might explain complex cellular states or tissue composition. To date, classical statistical analysis tools have being routinely applied, but there is considerable scope for the development of novel statistical approaches that are better adapted to the challenges of inferring cellular hierarchies.ResultsWe have developed a novel agglomerative clustering method that we call pcaReduce to generate a cell state hierarchy where each cluster branch is associated with a principal component of variation that can be used to differentiate two cell states. Using two real single cell datasets, we compared our approach to other commonly used statistical techniques, such as K-means and hierarchical clustering. We found that pcaReduce was able to give more consistent clustering structures when compared to broad and detailed cell type labels.ConclusionsOur novel integration of principal components analysis and hierarchical clustering establishes a connection between the representation of the expression data and the number of cell types that can be discovered. In doing so we found that pcaReduce performs better than either technique in isolation in terms of characterising putative cell states. Our methodology is complimentary to other single cell clustering techniques and adds to a growing palette of single cell bioinformatics tools for profiling heterogeneous cell populations.

[1]  Alex A. Pollen,et al.  Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex , 2014, Nature Biotechnology.

[2]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[3]  S. Linnarsson,et al.  Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing , 2014, Nature Neuroscience.

[4]  Christoph Bock,et al.  Single‐cell transcriptomes reveal characteristic features of human pancreatic islet cell types , 2015, EMBO reports.

[5]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[6]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[7]  J. Marioni,et al.  High-throughput spatial mapping of single-cell RNA-seq data to tissue of origin , 2015, Nature Biotechnology.

[8]  R. Sandberg,et al.  Single-Cell RNA-Seq Reveals Dynamic, Random Monoallelic Gene Expression in Mammalian Cells , 2014, Science.

[9]  N. Neff,et al.  Reconstructing lineage hierarchies of the distal lung epithelium using single cell RNA-seq , 2014, Nature.

[10]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[11]  E. Marco,et al.  Bifurcation analysis of single-cell gene expression data reveals epigenetic landscape , 2014, Proceedings of the National Academy of Sciences.

[12]  S. Bodovitz,et al.  Single cell analysis: the new frontier in 'omics'. , 2010, Trends in biotechnology.

[13]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[14]  Chen Xu,et al.  Identification of cell types from single-cell transcriptomes using a novel clustering method , 2015, Bioinform..

[15]  A. Saliba,et al.  Single-cell RNA-seq: advances and future challenges , 2014, Nucleic acids research.

[16]  Rona S. Gertner,et al.  Single cell RNA Seq reveals dynamic paracrine control of cellular variation , 2014, Nature.

[17]  W. Shi,et al.  The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote , 2013, Nucleic acids research.

[18]  Sarah A Teichmann,et al.  Computational assignment of cell-cycle stage from single-cell transcriptome data. , 2015, Methods.

[19]  Hans Clevers,et al.  Single-cell messenger RNA sequencing reveals rare intestinal cell types , 2015, Nature.

[20]  I. Amit,et al.  Massively Parallel Single-Cell RNA-Seq for Marker-Free Decomposition of Tissues into Cell Types , 2014, Science.

[21]  Cole Trapnell,et al.  Defining cell types and states with single-cell genomics , 2015, Genome research.

[22]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[23]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[24]  I. Macaulay,et al.  Single Cell Genomics: Advances and Future Perspectives , 2014, PLoS genetics.

[25]  L. Hubert,et al.  Comparing partitions , 1985 .

[26]  Adrian E. Raftery,et al.  MCLUST Version 3: An R Package for Normal Mixture Modeling and Model-Based Clustering , 2006 .

[27]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[28]  Fabian J Theis,et al.  Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells , 2015, Nature Biotechnology.

[29]  Pierre Baldi,et al.  Autoencoders, Unsupervised Learning, and Deep Architectures , 2011, ICML Unsupervised and Transfer Learning.

[30]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[31]  Shawn M. Gillespie,et al.  Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma , 2014, Science.

[32]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[33]  Hui Wang,et al.  SINCERA: A Pipeline for Single-Cell RNA-Seq Profiling Analysis , 2015, PLoS Comput. Biol..

[34]  S. Quake,et al.  Single-cell genomics , 2011, Nature Methods.

[35]  R. Sandberg Entering the era of single-cell transcriptomics in biology and medicine , 2013, Nature Methods.

[36]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[37]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[38]  Dan Tsafrir,et al.  Sorting points into neighborhoods (SPIN): data analysis and visualization by ordering distance matrices , 2005, Bioinform..

[39]  A. Regev,et al.  Spatial reconstruction of single-cell gene expression data , 2015 .