Saturating Single-Cell atlas Datasets

High throughput methods for profiling the transcriptomes of single cells have recently emerged as transformative approaches for large-scale population surveys of cellular diversity in heterogeneous primary tissues. Efficient generation of such an atlas will depend on sufficient sampling of the diverse cell types while remaining cost-effective to enable a comprehensive examination of organs, developmental stages, and individuals. To examine the relationship between cell number and transcriptional heterogeneity in the context of unbiased cell type classification, we explicitly explored the population structure of a publically available 1.3 million cell dataset from the E18.5 mouse brain. We propose a computational framework for inferring the saturation point of cluster discovery in a single cell mRNA-seq experiment, centered around cluster preservation in downsampled datasets. In addition, we introduce a “complexity index”, which characterizes the heterogeneity of cells in a given dataset. Using Cajal-Retzius cells as an example of a limited complexity dataset, we explored whether biological distinctions relate to technical clustering. Surprisingly, we found that clustering distinctions carrying biologically interpretable meaning are achieved with far fewer cells (20,000). Together, these findings suggest that most of the biologically interpretable insights from the 1.3 million cells can be recapitulated by analyzing 50,000 randomly selected cells, indicating that instead of profiling few individuals at high “cellular coverage”, the much anticipated cell atlasing studies may instead benefit from profiling more individuals, or many time points at lower cellular coverage. Recent efforts seek to create a comprehensive cell atlas of the human body1,2 Current technology, however, makes it precipitously expensive to perform analysis of every cell. Therefore, designing effective sampling strategies be critical to generate a working atlas in an efficient, cost-effective, and streamlined manner. The advent of single cell and single nucleus mRNA sequencing (RNAseq) in droplet format3,4 now enables large scale sampling of cells from any tissue, and a recently released publicly available dataset of 1.3 million single cells from the E18.5 mouse brain generated with the 10X Chromium5 provides an opportunity to explore the relationship between population structure and the number of sampled cells necessary to reveal the underlying diversity of cell types. Here, we present a framework for how researchers can evaluate whether a dataset has reached saturation, and we estimate how many cells would be required to generate an atlas of the sample analyzed here. This framework can be applied to any organ or cell type specific atlas for any organism.

[1]  Evan Z. Macosko,et al.  Comprehensive Classification of Retinal Bipolar Neurons by Single-Cell Transcriptomics , 2016, Cell.

[2]  Luca Muzio,et al.  Foxg1 Confines Cajal-Retzius Neuronogenesis and Hippocampal Morphogenesis to the Dorsomedial Pallium , 2005, The Journal of Neuroscience.

[3]  C. Englund,et al.  Cajal-Retzius cells in the mouse: transcription factors, neurotransmitters, and birthdays suggest a pallial origin. , 2003, Brain research. Developmental brain research.

[4]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[5]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[6]  G. Watts From vision to reality. , 1991, Nursing.

[7]  Ian R. Wickersham,et al.  The BRAIN Initiative Cell Census Consortium: Lessons Learned toward Generating a Comprehensive Brain Cell Atlas , 2017, Neuron.

[8]  Michael J. T. Stubbington,et al.  The Human Cell Atlas: from vision to reality , 2017, Nature.

[9]  S. Rétaux,et al.  Differential expression of LIM-homeodomain factors in Cajal-Retzius cells of primates, rodents, and birds. , 2010, Cerebral cortex.

[10]  Staci A. Sorensen,et al.  Adult Mouse Cortical Cell Taxonomy Revealed by Single Cell Transcriptomics , 2016 .

[11]  Sébastien Vigneau,et al.  Multiple origins of Cajal-Retzius cells at the borders of the developing pallium , 2005, Nature Neuroscience.

[12]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[13]  T. Curran,et al.  A protein related to extracellular matrix proteins deleted in the mouse mutant reeler , 1995, Nature.