BCseq: accurate single cell RNA-seq quantification with bias correction

Abstract With rapid technical advances, single cell RNA-seq (scRNA-seq) has been used to detect cell subtypes exhibiting distinct gene expression profiles and to trace cell transitions in development and disease. However, the potential of scRNA-seq for new discoveries is constrained by the robustness of subsequent data analysis. Here we propose a robust model, BCseq (bias-corrected sequencing analysis), to accurately quantify gene expression from scRNA-seq. BCseq corrects inherent bias of scRNA-seq in a data-adaptive manner and effectively removes technical noise. BCseq rescues dropouts through weighted consideration of similar cells. Cells with higher sequencing depths contribute more to the quantification nonlinearly. Furthermore, BCseq assigns a quality score for the expression of each gene in each cell, providing users an objective measure to select genes for downstream analysis. In comparison to existing scRNA-seq methods, BCseq demonstrates increased robustness in detection of differentially expressed (DE) genes and cell subtype classification.

[1]  L. Elo,et al.  ROTS: reproducible RNA-seq biomarker detector—prognostic markers for clear cell renal cell cancer , 2015, Nucleic acids research.

[2]  Wei Shi,et al.  featureCounts: an efficient general purpose program for assigning sequence reads to genomic features , 2013, Bioinform..

[3]  I. Hellmann,et al.  Comparative Analysis of Single-Cell RNA Sequencing Methods , 2016, bioRxiv.

[4]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[5]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[6]  Liang Chen,et al.  WemIQ: an accurate and robust isoform quantification method for RNA-seq data , 2015, Bioinform..

[7]  David A. Knowles,et al.  Batch effects and the effective design of single-cell gene expression studies , 2016, Scientific Reports.

[8]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[9]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[10]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[11]  Krishna R. Kalari,et al.  Beta-Poisson model for single-cell RNA-seq data analyses , 2016, Bioinform..

[12]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[13]  Felix Famoye,et al.  Lagrangian Probability Distributions , 2005 .

[14]  Lior Pachter,et al.  Fast and accurate single-cell RNA-seq analysis by clustering of transcript-compatibility counts , 2016, Genome Biology.

[15]  A. Zeileis Econometric Computing with HC and HAC Covariance Matrix Estimators , 2004 .

[16]  N. Neff,et al.  Quantitative assessment of single-cell RNA-sequencing methods , 2013, Nature Methods.

[17]  Lan Bao,et al.  Somatosensory neuron types identified by high-coverage single-cell RNA-sequencing and functional heterogeneity , 2016, Cell Research.

[18]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[19]  S. Srivastava,et al.  A two-parameter generalized Poisson model to improve the analysis of RNA-seq data , 2010, Nucleic acids research.

[20]  Gene W. Yeo,et al.  Single-Cell Alternative Splicing Analysis with Expedition Reveals Splicing Dynamics during Neuron Differentiation. , 2017, Molecular cell.

[21]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[22]  Xun Zhu,et al.  Single cell transcriptomics reveals unanticipated features of early hematopoietic precursors , 2016, Nucleic acids research.

[23]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.

[24]  Rhonda Bacher,et al.  Design and computational analysis of single-cell RNA-sequencing experiments , 2016, Genome Biology.

[25]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[26]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[27]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[28]  Maria K. Jaakkola,et al.  Comparison of methods to detect differentially expressed genes between single-cell populations , 2016, Briefings Bioinform..