Embracing the dropouts in single-cell RNA-seq dynamics modelling

Single-cell RNA sequencing (scRNA-seq) is one of the most exciting technological breakthroughs that many believe would revolutionise medical research. At the same time, this new technology has created some urgent data-mining challenges. Unlike sequencing data obtained from the old bulk sequencing technology, the scRNA-seq data are more noisy and biased. One particular data challenge is dropouts, where a low amount of mRNA leads to zero detection. Recently, an appealing idea has been to embrace dropouts and use potential dropout patterns for analysis. It has led to surprising results. In this paper, we take this idea further and focus on the problem of recovering gene dynamics from single-cell data. We show that dropouts can mislead the most commonly used model to produce the wrong dynamics. We propose a solution with two components, a nonlinear neural model based on neural ODEs and a hurdle distribution adaptable to potential signals in dropout patterns. We provide empirical evidence that demonstrates the advantages of our proposed model over a state-of-the-art method for scRNA-seq analysis.

[1]  S. Linnarsson,et al.  Dynamical Systems Model of RNA Velocity Improves Inference of Single-cell Trajectory, Pseudo-time and Gene Regulation. , 2022, Journal of molecular biology.

[2]  S. Hautaniemi,et al.  Longitudinal single-cell RNA-seq analysis reveals stress-promoted chemoresistance in metastatic ovarian cancer , 2022, Science advances.

[3]  R. Stewart,et al.  Network inference with Granger causality ensembles on single-cell transcriptomics. , 2022, Cell reports.

[4]  Michael M. Bronstein,et al.  GRAND: Graph Neural Diffusion , 2021, ICML.

[5]  P. Kharchenko The triumphs and limitations of computational methods for scRNA-seq , 2021, Nature Methods.

[6]  J. Li,et al.  PseudotimeDE: inference of differential gene expression along cell pseudotime with well-calibrated p-values from single-cell RNA sequencing data , 2021, Genome Biology.

[7]  Z. Zeng,et al.  Single‐cell RNA sequencing in cancer research , 2021, Journal of Experimental & Clinical Cancer Research.

[8]  Haiyang Wu,et al.  TIPS: trajectory inference of pathway significance through pseudotime comparison for functional assessment of single-cell RNAseq data , 2020, Briefings Bioinform..

[9]  Jingyi Jessica Li,et al.  PseudotimeDE: inference of differential gene expression along cell pseudotime with well-calibrated p-values from single-cell RNA sequencing data , 2020, Genome Biology.

[10]  Yu Zhang,et al.  The changing mouse embryo transcriptome at whole tissue and single-cell resolution , 2020, Nature.

[11]  Nikola Simidjievski,et al.  On Second Order Behaviour in Augmented Neural ODEs , 2020, NeurIPS.

[12]  Terry Lyons,et al.  Neural Controlled Differential Equations for Irregular Time Series , 2020, NeurIPS.

[13]  Peng Qiu,et al.  Embracing the dropouts in single-cell RNA-seq analysis , 2020, Nature Communications.

[14]  A. Yamashita,et al.  Dissecting Neural ODEs , 2020, Neural Information Processing Systems.

[15]  John C. Marioni,et al.  Single-cell RNA-sequencing of differentiating iPS cells reveals dynamic genetic effects on gene expression , 2020, Nature Communications.

[16]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[17]  Catalina A. Vallejos,et al.  Exploring a world of a thousand dimensions , 2019, Nature Biotechnology.

[18]  Atsushi Yamashita,et al.  Graph Neural Ordinary Differential Equations , 2019, ArXiv.

[19]  T. M. Murali,et al.  Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data , 2019, Nature Methods.

[20]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[21]  S. Teichmann,et al.  A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications , 2017, Genome Medicine.

[22]  S. Dudoit,et al.  A general and flexible method for signal extraction from single-cell RNA-seq data , 2017, Nature Communications.

[23]  Hisanori Kiryu,et al.  SCODE: an efficient regulatory network inference algorithm from single-cell RNA-Seq during differentiation , 2016, bioRxiv.

[24]  Steven L. Brunton,et al.  Inferring Biological Networks by Sparse Identification of Nonlinear Dynamics , 2016, IEEE Transactions on Molecular, Biological and Multi-Scale Communications.

[25]  Achim Zeileis,et al.  Visualizing Count Data Regressions Using Rootograms , 2016, 1605.01311.

[26]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[27]  Elhanan Borenstein,et al.  Conservation of trans-acting circuitry during mammalian regulatory evolution , 2014, Nature.

[28]  Runze Li,et al.  Statistical models for longitudinal zero‐inflated count data with applications to the substance abuse field , 2012, Statistics in medicine.

[29]  Shane J. Neph,et al.  Circuitry and Dynamics of Human Transcription Factor Regulatory Networks , 2012, Cell.

[30]  Mei-Chen Hu,et al.  Zero-Inflated and Hurdle Models of Count Data with Extra Zeros: Examples from an HIV-Risk Reduction Intervention Trial , 2011, The American journal of drug and alcohol abuse.

[31]  Dario Floreano,et al.  GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods , 2011, Bioinform..

[32]  Hitoshi Niwa,et al.  Extra-embryonic endoderm cells derived from ES cells induced by GATA Factors acquire the character of XEN cells , 2007, BMC Developmental Biology.

[33]  Alexei A. Sharov,et al.  Pluripotency governed by Sox2 via regulation of Oct3/4 expression in mouse embryonic stem cells , 2007, Nature Cell Biology.

[34]  G. Pan,et al.  A negative feedback loop of transcription factors that controls stem cell pluripotency and self‐renewal , 2006, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[35]  K. Wannemuehler,et al.  On the Use of Zero-Inflated and Hurdle Models for Modeling Vaccine Adverse Event Count Data , 2006, Journal of biopharmaceutical statistics.

[36]  R. Foreman,et al.  Foxd3 is required in the trophoblast progenitor cell lineage of the mouse embryo. , 2005, Developmental biology.

[37]  K. Ikeda,et al.  Six1 controls patterning of the mouse otic vesicle , 2004, Development.

[38]  M. Murakami,et al.  The Homeoprotein Nanog Is Required for Maintenance of Pluripotency in Mouse Epiblast and ES Cells , 2003, Cell.

[39]  G. Hamard,et al.  Altered myogenesis in Six1-deficient mice , 2003, Development.

[40]  A. Joyner,et al.  Gli2, but not Gli1, is required for initial Shh signaling and ectopic activation of the Shh pathway. , 2002, Development.

[41]  J. S. Santos Silva,et al.  A modified hurdle model for completed fertility , 2000 .

[42]  J. Butcher The numerical analysis of ordinary differential equations: Runge-Kutta and general linear methods , 1987 .

[43]  J. G. Cragg Some Statistical Models for Limited Dependent Variables with Application to the Demand for Durable Goods , 1971 .

[44]  David Duvenaud,et al.  Latent Ordinary Differential Equations for Irregularly-Sampled Time Series , 2019, NeurIPS.

[45]  Guangjin Pan,et al.  Nanog and transcriptional networks in embryonic stem cell pluripotency , 2007, Cell Research.