Integrative analysis of single-cell genomics data by coupled nonnegative matrix factorizations

Significance Biological samples are often heterogeneous mixtures of different types of cells. Suppose we have two single-cell datasets, each providing information on a different cellular feature and generated on a different sample from this mixture. Then, the clustering of cells in the two samples should be coupled as both clusterings are reflecting the underlying cell types in the same mixture. This “coupled clustering” problem is a new problem not covered by existing clustering methods. In this paper, we develop an approach for its solution based on the coupling of two nonnegative matrix factorizations. The method should be useful for integrative single-cell genomics analysis tasks such as the joint analysis of single-cell RNA-sequencing and single-cell ATAC-sequencing data. When different types of functional genomics data are generated on single cells from different samples of cells from the same heterogeneous population, the clustering of cells in the different samples should be coupled. We formulate this “coupled clustering” problem as an optimization problem and propose the method of coupled nonnegative matrix factorizations (coupled NMF) for its solution. The method is illustrated by the integrative analysis of single-cell RNA-sequencing (RNA-seq) and single-cell ATAC-sequencing (ATAC-seq) data.

[1]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[2]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[3]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[4]  A. Oshlack,et al.  Splatter: simulation of single-cell RNA sequencing data , 2017, Genome Biology.

[5]  William J. Greenleaf,et al.  chromVAR: Inferring transcription factor-associated accessibility from single-cell epigenomic data , 2017, Nature Methods.

[6]  Alicia N. Schep,et al.  Unsupervised clustering and epigenetic classification of single cells , 2017, Nature Communications.

[7]  Steffen Bickel,et al.  Multi-view clustering , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[8]  Howard Y. Chang,et al.  Single-cell chromatin accessibility reveals principles of regulatory variation , 2015, Nature.

[9]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[10]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[11]  Rhonda Bacher,et al.  Design and computational analysis of single-cell RNA-sequencing experiments , 2016, Genome Biology.

[12]  Howard Y. Chang,et al.  Lineage-specific and single cell chromatin accessibility charts human hematopoiesis and leukemia evolution , 2016, Nature Genetics.

[13]  Bing Ren,et al.  Systematic mapping of chromatin state landscapes during mouse development , 2017, bioRxiv.

[14]  P. Kharchenko,et al.  Integrative single-cell analysis of transcriptional and epigenetic states in the human adult brain , 2017, Nature Biotechnology.

[15]  Aviv Regev,et al.  Massively-parallel single nucleus RNA-seq with DroNc-seq , 2017, Nature Methods.

[16]  Catalin C. Barbacioru,et al.  mRNA-Seq whole-transcriptome analysis of a single cell , 2009, Nature Methods.

[17]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Phillip C. Yang,et al.  In vitro differentiation of mouse embryonic stem (mES) cells using the hanging drop method. , 2008, Journal of visualized experiments : JoVE.

[19]  Michael W. Berry,et al.  Algorithms and applications for approximate nonnegative matrix factorization , 2007, Comput. Stat. Data Anal..

[20]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[21]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[22]  Juan Liu,et al.  A novel computational framework for simultaneous integration of multiple types of genomic data to identify microRNA-gene regulatory modules , 2011, Bioinform..

[23]  O. Stegle,et al.  Single-Cell Genome-Wide Bisulfite Sequencing for Assessing Epigenetic Heterogeneity , 2014, Nature Methods.

[24]  M Maden,et al.  Retinoic acid and development of the central nervous system , 1992, BioEssays : news and reviews in molecular, cellular and developmental biology.

[25]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[26]  W. Wong,et al.  Modeling gene regulation from paired expression and chromatin accessibility data , 2017, Proceedings of the National Academy of Sciences.

[27]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[28]  N. Friedman,et al.  Chromatin state dynamics during blood formation , 2014, Science.

[29]  Michael D. Schneider,et al.  Endogenous retinoic acid regulates cardiac progenitor differentiation , 2010, Proceedings of the National Academy of Sciences.