Repeated Decision Stumping Distils Simple Rules from Single Cell Data

Here we introduce repeated decision stumping, to distill simple models from single cell data. We develop decision trees of depth one – hence ‘stumps’ – to identify in an inductive manner, gene products involved in driving cell fate transitions, and in applications to published data we are able to discover the key-players involved in these processes in an unbiased manner without prior knowledge. The approach is computationally efficient, has remarkable predictive power, and yields robust and statistically stable predictors: the same set of candidates is generated by applying the algorithm to different subsamples of the data.

[1]  K. Parain,et al.  A large scale screen for neural stem cell markers in Xenopus retina , 2012, Developmental neurobiology.

[2]  Michael P. H. Stumpf,et al.  Learning regulatory models for cell development from single cell transcriptomic data , 2017 .

[3]  Sarah Filippi,et al.  Information theory and signal transduction systems: from molecular information processing to network inference. , 2014, Seminars in cell & developmental biology.

[4]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[5]  Casper Kaae Sønderby,et al.  scVAE: variational auto-encoders for single-cell gene expression data , 2020, Bioinform..

[6]  M. Stumpf,et al.  Transition State Characteristics During Cell Differentiation , 2018, bioRxiv.

[7]  Sui Huang The Tension Between Big Data and Theory in the "Omics" Era of Biomedical Research , 2019, Perspectives in biology and medicine.

[8]  Jason M. Klusowski Sparse learning with CART , 2020, NeurIPS.

[9]  S. Pierce,et al.  Regulation of Spemann organizer formation by the intracellular kinase Xgsk-3. , 1995, Development.

[10]  H. Débat,et al.  TopA, the Sulfolobus solfataricus topoisomerase III, is a decatenase , 2017, Nucleic acids research.

[11]  Christian P. Robert,et al.  Large-scale inference , 2010 .

[12]  Thalia E. Chan,et al.  Gene Regulatory Network Inference from Single-Cell Data Using Multivariate Information Measures , 2016, bioRxiv.

[13]  G. La Manno,et al.  The emergence and promise of single-cell temporal-omics approaches. , 2020, Current opinion in biotechnology.

[14]  Rudiyanto Gunawan,et al.  Single-Cell-Based Analysis Highlights a Surge in Cell-to-Cell Molecular Variability Preceding Irreversible Commitment in a Differentiation Process , 2016, PLoS biology.

[15]  Matthias Hein,et al.  Provably Robust Boosted Decision Stumps and Trees against Adversarial Attacks , 2019, NeurIPS.

[16]  Ying Wang,et al.  Xenbase: a genomic, epigenomic and transcriptomic model organism database , 2017, Nucleic Acids Res..

[17]  M. Stumpf,et al.  Systems biology (un)certainties , 2015, Science.

[18]  John Lygeros,et al.  Iterative experiment design guides the characterization of a light-inducible gene expression circuit , 2015, Proceedings of the National Academy of Sciences.

[19]  M. Khammash,et al.  A universal biomolecular integral feedback controller for robust perfect adaptation , 2019, Nature.

[20]  Thomas M. Cover,et al.  Elements of Information Theory: Cover/Elements of Information Theory, Second Edition , 2005 .

[21]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[22]  Michael P H Stumpf,et al.  Control mechanisms for stochastic biochemical systems via computation of reachable sets , 2016, bioRxiv.

[23]  J. Marioni,et al.  Using single‐cell genomics to understand developmental processes and cell fate decisions , 2018, Molecular systems biology.

[24]  Diogo M. Camacho,et al.  Next-Generation Machine Learning for Biological Networks , 2018, Cell.

[25]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[26]  Patrick S. Stumpf,et al.  Stem Cell Differentiation as a Non-Markov Stochastic Process , 2017, Cell systems.

[27]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[28]  Alistair A. Young,et al.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2017, MICCAI 2017.

[29]  Kui Wang,et al.  Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis , 2020, Nature Communications.

[30]  Austin G Smith,et al.  Conversion of embryonic stem cells into neuroectodermal precursors in adherent monoculture , 2003, Nature Biotechnology.

[31]  Allon M. Klein,et al.  The dynamics of gene expression in vertebrate embryogenesis at single-cell resolution , 2018, Science.

[32]  Martin Vingron,et al.  Gene expression screening in Xenopus identifies molecular pathways, predicts gene function and provides a global view of embryonic patterning , 1998, Mechanisms of Development.

[33]  R. Satija,et al.  Integrative single-cell analysis , 2019, Nature Reviews Genetics.

[34]  N. Gao,et al.  Universality of cell differentiation trajectories revealed by a reconstruction of transcriptional uncertainty landscapes from single-cell transcriptomic data , 2020, bioRxiv.

[35]  John J Tyson,et al.  A Dynamical Paradigm for Molecular Cell Biology. , 2020, Trends in cell biology.

[36]  Michael P H Stumpf,et al.  An information-theoretic framework for deciphering pleiotropic and noisy biochemical signaling , 2018, Nature Communications.

[37]  Qing Nie,et al.  Cell lineage and communication network inference via optimization for single-cell transcriptomics , 2019, Nucleic acids research.

[38]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[39]  Hong Qian,et al.  Processes on the emergent landscapes of biochemical reaction networks and heterogeneous cell population dynamics: differentiation in living matters , 2017, Journal of The Royal Society Interface.

[40]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[41]  Robert C. Holte,et al.  Decision Tree Instability and Active Learning , 2007, ECML.

[42]  R. Aebersold,et al.  Proteomic and interactomic insights into the molecular basis of cell functional diversity , 2020, Nature Reviews Molecular Cell Biology.

[43]  Michael P H Stumpf,et al.  Transition state characteristics during cell differentiation , 2018, bioRxiv.

[44]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[45]  L. Goentoro,et al.  Two-Element Transcriptional Regulation in the Canonical Wnt Pathway , 2017, Current Biology.

[46]  PAUL KIRK,et al.  Balancing the Robustness and Predictive Performance of Biomarkers , 2013, J. Comput. Biol..

[47]  Heather A. Harrington,et al.  Cellular compartments cause multistability and allow cells to process more information. , 2013, Biophysical journal.

[48]  Li Zhong,et al.  Murine embryonic stem cell differentiation is promoted by SOCS-3 and inhibited by the zinc finger transcription factor Klf4. , 2005, Blood.

[49]  M. Hemberg,et al.  Challenges in unsupervised clustering of single-cell RNA-seq data , 2019, Nature Reviews Genetics.

[50]  Elizabeth H. Peuchen,et al.  Phosphorylation Dynamics Dominate the Regulated Proteome during Early Xenopus Development , 2017, Scientific Reports.

[51]  A. M. Arias,et al.  Transition states and cell fate decisions in epigenetic landscapes , 2016, Nature Reviews Genetics.

[52]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[53]  Alexey M. Kozlov,et al.  Eleven grand challenges in single-cell data science , 2020, Genome Biology.

[54]  D. Lubahn,et al.  ERRβ: A potent inhibitor of Nrf2 transcriptional activity , 2007, Molecular and Cellular Endocrinology.