A multi-center cross-platform single-cell RNA sequencing reference dataset

Single-cell RNA sequencing (scRNA-seq) is developing rapidly, and investigators seeking to use this technology are left with a variety of options for both experimental platform and bioinformatics methods. There is an urgent need for scRNA-seq reference datasets for benchmarking of different scRNA-seq platforms and bioinformatics methods. To be broadly applicable, these should be generated from renewable, well characterized reference samples and processed in multiple centers across different platforms. Here we present a benchmarking scRNA-seq dataset that includes 20 scRNA-seq datasets acquired either as a mixtures or as individual samples from two biologically distinct cell lines for which a large amount of multi-platform whole genome sequencing data are also available. These scRNA-seq datasets were generated from multiple popular platforms across four sequencing centers. Our benchmark datasets provide a resource that we believe will have great value for the single-cell community by serving as a reference dataset for evaluating various bioinformatics methods for scRNA-seq analyses, including but not limited to data preprocessing, imputation, normalization, clustering, batch correction, and differential analysis.

[1]  Kerstin B. Meyer,et al.  BBKNN: fast batch alignment of single cell transcriptomes , 2019, Bioinform..

[2]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[3]  Salah Ayoub,et al.  Cell fixation and preservation for droplet-based single-cell transcriptomics , 2017, BMC Biology.

[4]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[5]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[6]  Bonnie Berger,et al.  Efficient integration of heterogeneous single-cell transcriptomes using Scanorama , 2019, Nature Biotechnology.

[7]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[8]  R. Satija,et al.  Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression , 2019, Genome Biology.

[9]  Oliver Stegle,et al.  Benchmarking Single-Cell RNA Sequencing Protocols for Cell Atlas Projects , 2019, bioRxiv.

[10]  R. Satija,et al.  Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression , 2019, Genome Biology.

[11]  Laleh Haghverdi,et al.  Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors , 2018, Nature Biotechnology.

[12]  K. Holt,et al.  Performance of neural network basecalling tools for Oxford Nanopore sequencing , 2019, Genome Biology.

[13]  T. Hashimshony,et al.  CEL-Seq: single-cell RNA-Seq by multiplexed linear amplification. , 2012, Cell reports.

[14]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[15]  Åsa K. Björklund,et al.  Smart-seq2 for sensitive full-length transcriptome profiling in single cells , 2013, Nature Methods.

[16]  Jiacheng Yao,et al.  Comparative analysis of droplet-based ultra-high-throughput single-cell RNA-seq systems , 2018, bioRxiv.

[17]  Shuqiang Li,et al.  CEL-Seq2: sensitive highly-multiplexed single-cell RNA-Seq , 2016, Genome Biology.

[18]  Christoph Hafemeister,et al.  Comprehensive integration of single cell data , 2018, bioRxiv.

[19]  Fan Zhang,et al.  Fast, sensitive, and accurate integration of single cell data with Harmony , 2018, bioRxiv.

[20]  R. Sandberg,et al.  Full-Length mRNA-Seq from single cell levels of RNA and individual circulating tumor cells , 2012, Nature Biotechnology.

[21]  Wei Shi,et al.  featureCounts: an efficient general purpose program for assigning sequence reads to genomic features , 2013, Bioinform..

[22]  S. Dudoit,et al.  Normalization of RNA-seq data using factor analysis of control genes or samples , 2014, Nature Biotechnology.

[23]  Maithreyan Srinivasan,et al.  Nanogrid single-nucleus RNA sequencing reveals phenotypic diversity in breast cancer , 2017, Nature Communications.

[24]  A. Heger,et al.  UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy , 2016, bioRxiv.

[25]  Kok Siong Ang,et al.  A benchmark of batch-effect correction methods for single-cell RNA sequencing data , 2020, Genome Biology.

[26]  Christoph Ziegenhain,et al.  zUMIs - A fast and flexible pipeline to process RNA sequencing data with UMIs , 2017, bioRxiv.

[27]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[28]  Luke Zappia,et al.  Opportunities and challenges in long-read sequencing data analysis , 2020, Genome Biology.

[29]  Pak Chung Sham,et al.  Linnorm: improved statistical analysis for single cell RNA-seq expression data , 2017, Nucleic acids research.

[30]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[31]  J. Zook,et al.  Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls , 2013, Nature Biotechnology.

[32]  I. Hellmann,et al.  Comparative Analysis of Single-Cell RNA Sequencing Methods , 2016, bioRxiv.

[33]  J. Langeveld,et al.  KALLISTO: cost effective and integrated optimization of the urban wastewater system Eindhoven , 2012 .

[34]  T. Ohshima,et al.  Stimulated emission from nitrogen-vacancy centres in diamond , 2016, Nature Communications.

[35]  M. Newton,et al.  SCnorm: robust normalization of single-cell RNA-seq data , 2017, Nature Methods.

[36]  J. Marioni,et al.  Pooling across cells to normalize single-cell RNA sequencing data with many zero counts , 2016, Genome Biology.

[37]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[38]  Luyi Tian,et al.  Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments , 2019, Nature Methods.

[39]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.