Visualizing Transitions and Structure for Biological Data Exploration

Abstract In the era of ‘Big Data’ there is a pressing need for tools that provide human interpretable visualizations of emergent patterns in high-throughput high-dimensional data. Further, to enable insightful data exploration, such visualizations should faithfully capture and emphasize emergent structures and patterns without enforcing prior assumptions on the shape or form of the data. In this paper, we present PHATE (Potential of Heat-diffusion for Affinity-based Transition Embedding) - an unsupervised low-dimensional embedding for visualization of data that is aimed at solving these issues. Unlike previous methods that are commonly used for visualization, such as PCA and tSNE, PHATE is able to capture and highlight both local and global structure in the data. In particular, in addition to clustering patterns, PHATE also uncovers and emphasizes progression and transitions (when they exist) in the data, which are often missed in other visualization-capable methods. Such patterns are especially important in biological data that contain, for example, single-cell phenotypes at different phases of differentiation, patients at different stages of disease progression, and gut microbial compositions that vary gradually between individuals, even of the same enterotype. The embedding provided by PHATE is based on a novel informational distance that captures long-range nonlinear relations in the data by computing energy potentials of data-adaptive diffusion processes. We demonstrate the effectiveness of the produced visualization in revealing insights on a wide variety of biomedical data, including single-cell RNA-sequencing, mass cytometry, gut microbiome sequencing, human SNP data, Hi-C data, as well as non-biomedical data, such as facebook network and facial image data. In order to validate the capability of PHATE to enable exploratory analysis, we generate a new dataset of 31,000 single-cells from a human embryoid body differentiation system. Here, PHATE provides a comprehensive picture of the differentiation process, while visualizing major and minor branching trajectories in the data. We validate that all known cell types are recapitulated in the PHATE embedding in proper organization. Furthermore, the global picture of the system offered by PHATE allows us to connect parts of the developmental progression and characterize novel regulators associated with developmental lineages.

[1]  Uri Alon,et al.  Inferring biological tasks using Pareto analysis of high-dimensional data , 2015, Nature Methods.

[2]  Dave Bridges,et al.  Characterization of neurons from immortalized dental pulp stem cells for the study of neurogenetic disorders. , 2015, Stem cell research.

[3]  Li Wang,et al.  Dimensionality Reduction Via Graph Structure Learning , 2015, KDD.

[4]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[5]  Gil David,et al.  Hierarchical data organization , clustering and denoising via localized diffusion folders , 2011 .

[6]  Jianuo Liu,et al.  Differentiation of embryonic stem cells to retinal cells in vitro. , 2006, Methods in molecular biology.

[7]  Tasuku Honjo,et al.  In Vitro Development of Primitive and Definitive Erythrocytes from Different Precursors , 1996, Science.

[8]  WangXiaoying,et al.  The Polycomb Protein Ezh2 Impacts on Induced Pluripotent Stem Cell Generation , 2014 .

[9]  R. Yu,et al.  Further characterization of embryonic stem cell‐derived radial glial cells , 2006, Glia.

[10]  George Q. Daley,et al.  Derivation of embryonic germ cells and male gametes from embryonic stem cells , 2004, Nature.

[11]  Ronald R. Coifman,et al.  Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators , 2005, NIPS.

[12]  Evan Z. Macosko,et al.  Comprehensive Classification of Retinal Bipolar Neurons by Single-Cell Transcriptomics , 2016, Cell.

[13]  Elena K. Kandror,et al.  Single-cell topological RNA-Seq analysis reveals insights into cellular differentiation and development , 2017, Nature Biotechnology.

[14]  T. Moon,et al.  Mathematical Methods and Algorithms for Signal Processing , 1999 .

[15]  Myung Soo Cho,et al.  Efficient Induction of Oligodendrocytes from Human Embryonic Stem Cells , 2007, Stem cells.

[16]  Miquel Salicrú Pagés,et al.  Sobre ciertas propiedades de la M-divergencia en análisis de datos , 1985 .

[17]  B. Nadler,et al.  Diffusion maps, spectral clustering and reaction coordinates of dynamical systems , 2005, math/0503445.

[18]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[19]  Hui Zeng,et al.  Gfi1 and Gfi1b act equivalently in haematopoiesis, but have distinct, non‐overlapping functions in inner ear development , 2006, EMBO reports.

[20]  Neva C. Durand,et al.  Deletion of DXZ4 on the human inactive X chromosome alters higher-order genome architecture , 2016, Proceedings of the National Academy of Sciences.

[21]  Facundo Mémoli,et al.  Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition , 2007, PBG@Eurographics.

[22]  Alvaro Rada-Iglesias,et al.  Epigenomic annotation of enhancers predicts transcriptional regulators of human neural crest. , 2012, Cell stem cell.

[23]  P. Bérard,et al.  Embedding Riemannian manifolds by their heat kernel , 1994 .

[24]  Howard Y. Chang,et al.  Genome-Wide Temporal Profiling of Transcriptome and Open Chromatin of Early Cardiomyocyte Differentiation Derived From hiPSCs and hESCs , 2017, Circulation research.

[25]  G. Bianconi,et al.  Shannon and von Neumann entropy of random networks with heterogeneous expected degree. , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  J. Jansen,et al.  Gfi1 and Gfi1b: key regulators of hematopoiesis , 2010, Leukemia.

[27]  M. Maggioni,et al.  Manifold parametrizations by eigenfunctions of the Laplacian and heat kernels , 2008, Proceedings of the National Academy of Sciences.

[28]  Gabriela Kania,et al.  Differentiation of mouse embryonic stem cells to insulin-producing cells , 2006, Nature Protocols.

[29]  A. Oudenaarden,et al.  Validation of noise models for single-cell transcriptomics , 2014, Nature Methods.

[30]  Jun Yamashita,et al.  Flk1-positive cells derived from embryonic stem cells serve as vascular progenitors , 2000, Nature.

[31]  Sean C. Bendall,et al.  viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia , 2013, Nature Biotechnology.

[32]  Carlos Luzzani,et al.  Modulation of chromatin modifying factors' gene expression in embryonic and induced pluripotent stem cells. , 2011, Biochemical and biophysical research communications.

[33]  Y. Barde,et al.  Generation of a defined and uniform population of CNS progenitors and neurons from mouse embryonic stem cells , 2007, Nature Protocols.

[34]  Martin Wattenberg,et al.  How to Use t-SNE Effectively , 2016 .

[35]  Alfred O. Hero,et al.  Determining Intrinsic Dimension and Entropy of High-Dimensional Shape Spaces , 2006, Statistics and Analysis of Shapes.

[36]  J. Neumann Mathematische grundlagen der Quantenmechanik , 1935 .

[37]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[38]  Eli R. Zunder,et al.  A continuous molecular roadmap to iPSC reprogramming through progression analysis of single-cell mass cytometry. , 2015, Cell stem cell.

[39]  Alfred O. Hero,et al.  Ensemble estimation of multivariate f-divergence , 2014, 2014 IEEE International Symposium on Information Theory.

[40]  Stefan Steinerberger,et al.  On the diffusion geometry of graph Laplacians and applications , 2016, Applied and Computational Harmonic Analysis.

[41]  Alfred O. Hero,et al.  Multivariate f-divergence Estimation With Confidence , 2014, NIPS.

[42]  Y Fujiwara,et al.  Arrested development of embryonic red cell precursors in mouse embryos lacking transcription factor GATA-1. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Asher Mullard Proliferation without differentiation , 2008 .

[44]  Sean C. Bendall,et al.  Conditional density-based analysis of T cell signaling in single-cell data , 2014, Science.

[45]  Sayan Mukherjee,et al.  A phylogenetic transform enhances analysis of compositional microbiota data , 2016 .

[46]  Gabriela Kania,et al.  Generation of glycogen- and albumin-producing hepatocyte-like cells from embryonic stem cells , 2004, Biological chemistry.

[47]  T. Ichisaka,et al.  Suppression of induced pluripotent stem cell generation by the p53–p21 pathway , 2009, Nature.

[48]  S. Orkin,et al.  Erythroid differentiation in chimaeric mice blocked by a targeted mutation in the gene for transcription factor GATA-1 , 1991, Nature.

[49]  H. Waldmann,et al.  Directed differentiation of dendritic cells from mouse embryonic stem cells , 2000, Current Biology.

[50]  Sean C. Bendall,et al.  Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis , 2015, Cell.

[51]  Hans R Schöler,et al.  Generating oocytes and sperm from embryonic stem cells. , 2005, Seminars in reproductive medicine.

[52]  Sean C. Bendall,et al.  Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE , 2011, Nature Biotechnology.

[53]  Kevin R. Moon,et al.  MAGIC: A diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data , 2017, bioRxiv.

[54]  P. Nielsen,et al.  In vitro generation of lymphoid precursors from embryonic stem cells. , 1994, The EMBO journal.

[55]  Garry P Nolan,et al.  Visualization and cellular hierarchy inference of single-cell data using SPADE , 2016, Nature Protocols.

[56]  Hannah A. Pliner,et al.  Reversed graph embedding resolves complex single-cell trajectories , 2017, Nature Methods.

[57]  S. Nishikawa,et al.  Progressive lineage analysis by cell sorting and culture identifies FLK1+VE-cadherin+ cells at a diverging point of endothelial and hemopoietic lineages. , 1998, Development.

[58]  Ivor W. Tsang,et al.  Principal Graph and Structure Learning Based on Reversed Graph Embedding. , 2017, IEEE transactions on pattern analysis and machine intelligence.

[59]  Cole Trapnell,et al.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells , 2014, Nature Biotechnology.

[60]  Sean C. Bendall,et al.  Single-Cell Mass Cytometry of Differential Immune and Drug Responses Across a Human Hematopoietic Continuum , 2011, Science.

[61]  S. Yamanaka,et al.  Induction of Pluripotent Stem Cells from Mouse Embryonic and Adult Fibroblast Cultures by Defined Factors , 2006, Cell.

[62]  L. Zon,et al.  In vivo immunological function of mast cells derived from embryonic stem cells: an approach for the rapid analysis of even embryonic lethal mutations in adult mice in vivo. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[63]  Swapan Mallick,et al.  Ancient Admixture in Human History , 2012, Genetics.

[64]  I. Amit,et al.  Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors , 2015, Cell.

[65]  G. Keller,et al.  Multiple hematopoietic lineages develop from embryonic stem (ES) cells in culture. , 1991, Development.

[66]  A M Wobus,et al.  Muscle cell differentiation of embryonic stem cells reflects myogenesis in vivo: developmentally regulated expression of myogenic determination genes and functional expression of ionic currents. , 1994, Developmental biology.

[67]  Fabian J Theis,et al.  Diffusion pseudotime robustly reconstructs lineage branching , 2016, Nature Methods.

[68]  Hsiang-Po Huang,et al.  Epithelial Cell Adhesion Molecule (EpCAM) Complex Proteins Promote Transcription Factor-mediated Pluripotency Reprogramming* , 2011, The Journal of Biological Chemistry.

[69]  Alfred O. Hero,et al.  On Local Intrinsic Dimension Estimation and Its Applications , 2010, IEEE Transactions on Signal Processing.

[70]  Jonathan M. Cairns,et al.  Global reorganisation of cis-regulatory units upon lineage commitment of human embryonic stem cells , 2017, eLife.

[71]  Sean C. Bendall,et al.  Wishbone identifies bifurcating developmental trajectories from single-cell data , 2016, Nature Biotechnology.

[72]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[73]  Naihe Jing,et al.  AF9 promotes hESC neural differentiation through recruiting TET2 to neurodevelopmental gene loci for methylcytosine hydroxylation , 2015, Cell Discovery.

[74]  J. Crispino,et al.  GATA1 in normal and malignant hematopoiesis. , 2005, Seminars in cell & developmental biology.

[75]  R. Stewart,et al.  Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm , 2016, Genome Biology.

[76]  Stanford,et al.  Learning to Discover Social Circles in Ego Networks , 2012 .

[77]  Tae-Hoon Lee,et al.  The suppression of zfpm-1 accelerates the erythropoietic differentiation of human CD34+ cells. , 2007, Biochemical and biophysical research communications.

[78]  Jürgen Hescheler,et al.  Embryonic stem cells differentiate in vitro into cardiomyocytes representing sinusnodal, atrial and ventricular cell types , 1993, Mechanisms of Development.

[79]  Sean C. Bendall,et al.  Single-Cell Trajectory Detection Uncovers Progression and Regulatory Coordination in Human B Cell Development , 2014, Cell.

[80]  V. Blank,et al.  Regulation and function of the NFE2 transcription factor in hematopoietic and non-hematopoietic cells , 2015, Cellular and Molecular Life Sciences.

[81]  O Shoval,et al.  Evolutionary Trade-Offs, Pareto Optimality, and the Geometry of Phenotype Space , 2012, Science.

[82]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[83]  Tae-Hoon Lee,et al.  Gene expression profiling related to the enhanced erythropoiesis in mouse bone marrow cells , 2008, Journal of cellular biochemistry.

[84]  G. Crabtree,et al.  From neural development to cognition: unexpected roles for chromatin , 2013, Nature Reviews Genetics.

[85]  S. Ramaswamy,et al.  A Molecular Roadmap of Reprogramming Somatic Cells into iPS Cells , 2012, Cell.