netSmooth: Network-smoothing based imputation for single cell RNA-seq

Single cell RNA-seq (scRNA-seq) experiments suffer from a range of characteristic technical biases, such as dropouts (zero or near zero counts) and high variance. Current analysis methods rely on imputing missing values by various means of local averaging or regression, often amplifying biases inherent in the data. We present netSmooth, a network-diffusion based method that uses priors for the covariance structure of gene expression profiles on scRNA-seq experiments in order to smooth expression values. We demonstrate that netSmooth improves clustering results of scRNA-seq experiments from distinct cell populations, time-course experiments, and cancer genomics. We provide an R package for our method, available at: https://github.com/BIMSBbioinfo/netSmooth.

[1]  Loriene Roy What Is a Reference Source? , 2018, The Reference Librarian.

[2]  Wei Vivian Li,et al.  An accurate and robust imputation method scImpute for single-cell RNA-seq data , 2018, Nature Communications.

[3]  A. Akalin,et al.  BIMSBbioinfo/netSmooth: first release for zenodo , 2017 .

[4]  Mark D. Robinson,et al.  Bias, robustness and scalability in differential expression analysis of single-cell RNA-seq data , 2017, bioRxiv.

[5]  J. Li,et al.  scImpute: accurate and robust imputation for single cell RNA-seq data , 2017, bioRxiv.

[6]  Kevin R. Moon,et al.  MAGIC: A diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data , 2017, bioRxiv.

[7]  Aaron T. L. Lun,et al.  Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R , 2017, Bioinform..

[8]  A. Regev,et al.  Revealing the vectors of cellular identity with single-cell genomics , 2016, Nature Biotechnology.

[9]  Damian Szklarczyk,et al.  The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible , 2016, Nucleic Acids Res..

[10]  Joshua W. K. Ho,et al.  CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data , 2016, Genome Biology.

[11]  Nicola K. Wilson,et al.  A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. , 2016, Blood.

[12]  Nuno A. Fonseca,et al.  Expression Atlas update—an integrated database of gene and protein expression in humans, animals and plants , 2015, Nucleic Acids Res..

[13]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[14]  Shawn M. Gillespie,et al.  Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma , 2014, Science.

[15]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[16]  R. Sandberg,et al.  Single-Cell RNA-Seq Reveals Dynamic, Random Monoallelic Gene Expression in Mammalian Cells , 2014, Science.

[17]  N. Neff,et al.  Quantitative assessment of single-cell RNA-sequencing methods , 2013, Nature Methods.

[18]  Andrew M. Gross,et al.  Network-based stratification of tumor mutations , 2013, Nature Methods.

[19]  E. Marcotte,et al.  Prioritizing candidate disease genes by network-based boosting of genome-wide association data. , 2011, Genome research.

[20]  Eli Upfal,et al.  Algorithms for Detecting Significantly Mutated Pathways in Cancer , 2010, RECOMB.

[21]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[22]  Davis J. McCarthy,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[23]  Hui Lu,et al.  Correlation between gene expression profiles and protein-protein interactions within and across genomes , 2005, Bioinform..

[24]  A. E. Hirsh,et al.  Coevolution of gene expression among interacting proteins , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[26]  L. Snipen,et al.  Statistical Applications in Genetics and Molecular Biology Smoothing Gene Expression Data with Network Information Improves Consistency of Regulated Genes , 2011 .

[27]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[28]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..