Interpreting structure in sequence count data with differential expression analysis allowing for grades of membership

“Parts-based” representations, such as non-negative matrix factorization and topic modeling, have been used to identify structure from single-cell sequencing data sets, in particular structure that is not as well captured by clustering or other dimensionality reduction methods. However, interpreting the individual “parts” remains a challenge. To address this challenge, we extend methods for differential expression analysis by allowing cells to have partial membership to multiple groups. We call this grade of membership differential expression (GoM DE). We illustrate the benefits of GoM DE for annotating topics identified in several single-cell RNA-seq and ATAC-seq data sets.

[1]  S. Aerts,et al.  SCENIC+: single-cell multiomic inference of enhancers and gene regulatory networks , 2022, bioRxiv.

[2]  H. Gu,et al.  Prediction of cellular targets in diabetic kidney diseases with single-cell transcriptomic analysis of db/db mouse kidneys , 2022, Journal of Cell Communication and Signaling.

[3]  Y. Gilad,et al.  Characterizing gene expression in an in vitro biomechanical strain model of joint health , 2022, F1000Research.

[4]  Paul J. Hoffman,et al.  Dictionary learning for integrative, multimodal and scalable single-cell analysis , 2022, bioRxiv.

[5]  Evan Z. Macosko,et al.  Cell type-specific inference of differential expression in spatial transcriptomics , 2021, Nature Methods.

[6]  C. Lareau,et al.  Single-cell chromatin state analysis with Signac , 2021, Nature Methods.

[7]  Y. Gilad,et al.  Evolutionary insights into primate skeletal gene regulation using a comparative cell culture model , 2021, bioRxiv.

[8]  L. Pachter,et al.  The specious art of single-cell genomics , 2021, bioRxiv.

[9]  Trevor J Pugh,et al.  A comparison of data integration methods for single-cell RNA sequencing of cancer samples , 2021, bioRxiv.

[10]  Jason D. Buenrostro,et al.  Functional inference of gene regulation using single-cell multi-omics , 2021, bioRxiv.

[11]  P. Kharchenko The triumphs and limitations of computational methods for scRNA-seq , 2021, Nature Methods.

[12]  Benjamin J. Strober,et al.  Human embryoid bodies as a novel system for genomic studies of functionally diverse cell types , 2021, bioRxiv.

[13]  Matthew Stephens,et al.  Non-negative matrix factorization algorithms greatly improve topic model fits , 2021, ArXiv.

[14]  N. Yosef,et al.  PeakVI: A deep generative model for single-cell chromatin accessibility analysis , 2021, bioRxiv.

[15]  Ariel J. Levine,et al.  Confronting false discoveries in single-cell differential expression , 2021, Nature Communications.

[16]  B. Ren,et al.  Comprehensive analysis of single cell ATAC-seq data with SnapATAC , 2021, Nature Communications.

[17]  Howard Y. Chang,et al.  ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis , 2021, Nature Genetics.

[18]  Monika S. Kowalczyk,et al.  Skin-resident innate lymphoid cells converge on a pathogenic effector state , 2021, Nature.

[19]  D. Kobak,et al.  Initialization is critical for preserving global data structure in both t-SNE and UMAP , 2021, Nature Biotechnology.

[20]  Anushya Muruganujan,et al.  The Gene Ontology resource: enriching a GOld mine , 2020, Nucleic Acids Res..

[21]  Nicolas Gillis,et al.  Algorithms for Nonnegative Matrix Factorization with the Kullback–Leibler Divergence , 2020, Journal of Scientific Computing.

[22]  Shigeto Seno,et al.  SC-JNMF: single-cell clustering integrating multiple quantification methods based on joint non-negative matrix factorization , 2020, bioRxiv.

[23]  Fan Zhang,et al.  Single-cell transcriptomics in cancer: computational challenges and opportunities , 2020, Experimental & Molecular Medicine.

[24]  M. Ng,et al.  Dimensionality reduction for single cell RNA sequencing data using constrained robust non-negative matrix factorization , 2020, NAR genomics and bioinformatics.

[25]  Clifford A. Meyer,et al.  Integrative analyses of single-cell transcriptome and regulome using MAESTRO , 2020, Genome Biology.

[26]  Aviv Regev,et al.  Chromatin Potential Identified by Shared Single-Cell Profiling of RNA and Chromatin , 2020, Cell.

[27]  A. S. Booeshaghi,et al.  Normalization of single-cell RNA-seq counts by log(x + 1) or log(1 + x) , 2020, bioRxiv.

[28]  J. Marioni,et al.  MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data , 2020, Genome Biology.

[29]  Matthew Stephens,et al.  Separating measurement and expression models clarifies confusion in single-cell RNA sequencing analysis , 2020, Nature Genetics.

[30]  J. Banchereau,et al.  Sestrins induce natural killer function in senescent-like CD8+ T cells , 2020, Nature Immunology.

[31]  Q. Nie,et al.  scAI: an unsupervised approach for the integrative analysis of parallel single-cell transcriptomic and epigenomic profiles , 2020, Genome Biology.

[32]  Feng Yan,et al.  From reads to insight: a hitchhiker’s guide to ATAC-seq data analysis , 2020, Genome Biology.

[33]  Kok Siong Ang,et al.  A benchmark of batch-effect correction methods for single-cell RNA sequencing data , 2020, Genome Biology.

[34]  P. Boutros,et al.  Optimization and expansion of non-negative matrix factorization , 2020, BMC Bioinformatics.

[35]  Chris Sander,et al.  Pathway Commons 2019 Update: integration, analysis and exploration of pathway data , 2019, Nucleic Acids Res..

[36]  Matthew Stephens,et al.  Creating and sharing reproducible research code the workflowr way , 2019, F1000Research.

[37]  A. Regev,et al.  Transcriptional Atlas of Intestinal Immune Cells Reveals that Neuropeptide α-CGRP Modulates Group 2 Innate Lymphoid Cell Responses. , 2019, Immunity.

[38]  Miguel A. Andrade-Navarro,et al.  Assessment of computational methods for the analysis of single-cell ATAC-seq data , 2019, Genome Biology.

[39]  Jelle Goeman,et al.  Simultaneous Enrichment Analysis of all Possible Gene-sets: Unifying Self-Contained and Competitive Methods , 2019, Briefings Bioinform..

[40]  Eric J. Deeds,et al.  A novel metric reveals previously unrecognized distortion in dimensionality reduction of scRNA-seq data , 2019, bioRxiv.

[41]  Ken S. Lau,et al.  A Quantitative Framework for Evaluating Single-Cell Data Structure Preservation by Dimensionality Reduction Techniques , 2019, bioRxiv.

[42]  Jason D. Buenrostro,et al.  Droplet-based combinatorial indexing for massive-scale single-cell chromatin accessibility , 2019, Nature Biotechnology.

[43]  Evan Z. Macosko,et al.  Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity , 2019, Cell.

[44]  Daniel Marbach,et al.  Gene-set Enrichment with Regularized Regression , 2019, bioRxiv.

[45]  Samuel Demharter,et al.  Joint analysis of heterogeneous single-cell RNA-seq dataset collections , 2019, Nature Methods.

[46]  Marcel J. T. Reinders,et al.  A comparison of automatic cell identification methods for single-cell RNA sequencing data , 2019, Genome Biology.

[47]  Xiang Zhou,et al.  Accuracy, robustness and scalability of dimensionality reduction methods for single-cell RNA-seq analysis , 2019, Genome Biology.

[48]  Gowtham Atluri,et al.  Resolving single-cell heterogeneity from hundreds of thousands of cells through sequential hybrid clustering and NMF , 2019, bioRxiv.

[49]  Daniel S. Kim,et al.  Integrating regulatory DNA sequence and gene expression to predict genome-wide chromatin accessibility across cellular contexts , 2019, bioRxiv.

[50]  S. Aerts,et al.  cisTopic: cis-regulatory topic modeling on single-cell ATAC-seq data , 2019, Nature Methods.

[51]  Yang Liu,et al.  A fast and efficient count-based matrix factorization method for detecting cell types from single-cell RNAseq data , 2019, BMC Systems Biology.

[52]  D. Powell,et al.  Topconfects: a package for confident effect sizes in differential expression analysis provides a more biologically useful ranked gene list , 2019, Genome biology.

[53]  Raphael Gottardo,et al.  Orchestrating single-cell analysis with Bioconductor , 2019, Nature Methods.

[54]  Gary D Bader,et al.  Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data , 2019, bioRxiv.

[55]  Rafael A. Irizarry,et al.  Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model , 2019, Genome Biology.

[56]  Stefan Steinerberger,et al.  Fast Interpolation-based t-SNE for Improved Visualization of Single-Cell RNA-Seq Data , 2017, Nature Methods.

[57]  Benjamin J. Raphael,et al.  netNMF-sc: leveraging gene–gene interactions for imputation and dimensionality reduction in single-cell expression analysis , 2019, bioRxiv.

[58]  R. Satija,et al.  Integrative single-cell analysis , 2019, Nature Reviews Genetics.

[59]  C. Nelson,et al.  Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data , 2019, BMC Bioinformatics.

[60]  M. Hemberg,et al.  Challenges in unsupervised clustering of single-cell RNA-seq data , 2019, Nature Reviews Genetics.

[61]  C. Marquette,et al.  Novel dynamics of human mucociliary differentiation revealed by single-cell RNA sequencing of nasal epithelial cultures , 2019, Development.

[62]  Matthew Stephens,et al.  A simple new approach to variable selection in regression, with application to genetic fine-mapping , 2018, bioRxiv.

[63]  Lai Guan Ng,et al.  Dimensionality reduction for visualizing single-cell data using UMAP , 2018, Nature Biotechnology.

[64]  Fan Zhang,et al.  Fast, sensitive, and accurate integration of single cell data with Harmony , 2018, bioRxiv.

[65]  Christoph Hafemeister,et al.  Comprehensive integration of single cell data , 2018, bioRxiv.

[66]  Philipp Berens,et al.  The art of using t-SNE for single-cell transcriptomics , 2018, Nature Communications.

[67]  M. Rattray,et al.  Classifying cells with Scasat, a single-cell ATAC-seq analysis tool , 2018, Nucleic acids research.

[68]  Andrew C. Adey,et al.  Joint profiling of chromatin accessibility and gene expression in thousands of single cells , 2018, Science.

[69]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[70]  Andrew C. Adey,et al.  Cicero Predicts cis-Regulatory DNA Interactions from Single-Cell Chromatin Accessibility Data. , 2018, Molecular cell.

[71]  Aaron Lun,et al.  Overcoming systematic errors caused by log-transformation of normalized single-cell RNA sequencing data , 2018, bioRxiv.

[72]  D. Loo,et al.  Physiology of renal glucose handling via SGLT1, SGLT2 and GLUT2 , 2018, Diabetologia.

[73]  Luyi Tian,et al.  Comparison of clustering tools in R for medium-sized 10x Genomics single-cell RNA-sequencing data , 2018, F1000Research.

[74]  Aviv Regev,et al.  A revised airway epithelial hierarchy includes CFTR-expressing ionocytes , 2018, Nature.

[75]  William S. DeWitt,et al.  A Single-Cell Atlas of In Vivo Mammalian Chromatin Accessibility , 2018, Cell.

[76]  James Hicks,et al.  Single-cell RNA-seq analysis identifies markers of resistance to targeted BRAF inhibitors in melanoma cell populations , 2018, Genome research.

[77]  Charlotte Soneson,et al.  A systematic performance evaluation of clustering methods for single-cell RNA-seq data , 2018, F1000Research.

[78]  Francisco J. R. Ruiz,et al.  De novo gene signature identification from single‐cell RNA‐seq with hierarchical Poisson factorization , 2018, bioRxiv.

[79]  Martin J. Aryee,et al.  Integrated Single-Cell Analysis Maps the Continuous Regulatory Landscape of Human Hematopoietic Differentiation , 2018, Cell.

[80]  Mingyao Li,et al.  Single-cell transcriptomics of the mouse kidney reveals potential cellular targets of kidney disease , 2018, Science.

[81]  Nicolas Gillis,et al.  Accelerating Nonnegative Matrix Factorization Algorithms Using Extrapolation , 2018, Neural Computation.

[82]  Pardis C Sabeti,et al.  Identifying gene expression programs of cell-type identity and cellular activity with single-cell RNA-Seq , 2018, bioRxiv.

[83]  Joseph G Ibrahim,et al.  Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences , 2018, bioRxiv.

[84]  J. Marioni,et al.  Multi‐Omics Factor Analysis—a framework for unsupervised integration of multi‐omics data sets , 2018, Molecular systems biology.

[85]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[86]  Laleh Haghverdi,et al.  Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors , 2018, Nature Biotechnology.

[87]  D. Warton Why you cannot transform your way out of trouble for small counts , 2018, Biometrics.

[88]  Charlotte Soneson,et al.  Bias, robustness and scalability in single-cell differential expression analysis , 2018, Nature Methods.

[89]  S. Dudoit,et al.  A general and flexible method for signal extraction from single-cell RNA-seq data , 2018, Nature Communications.

[90]  Thomas Oberlin,et al.  Negative Binomial Matrix Factorization for Recommender Systems , 2018, ArXiv.

[91]  S. Lambert-Lacroix,et al.  Probabilistic Count Matrix Factorization for Single Cell Expression Data Analysis , 2017, bioRxiv.

[92]  Hannah A. Pliner,et al.  The cis-regulatory dynamics of embryonic development at single cell resolution , 2017, Nature.

[93]  Stefan Steinerberger,et al.  Clustering with t-SNE, provably , 2017, SIAM J. Math. Data Sci..

[94]  T. Tuschl,et al.  Single cell RNA sequencing to dissect the molecular heterogeneity in lupus nephritis. , 2017, JCI insight.

[95]  Zheng Tracy Ke,et al.  A new SVD approach to optimal topic estimation , 2017 .

[96]  N. Spassky,et al.  The development and functions of multiciliated epithelia , 2017, Nature Reviews Molecular Cell Biology.

[97]  J. Visvader,et al.  Elf5 is a principal cell lineage specific transcription factor in the kidney that contributes to Aqp2 and Avpr2 gene expression. , 2017, Developmental biology.

[98]  B. Hogan,et al.  Lung organoids: current uses and future promise , 2017, Development.

[99]  C. Simillion,et al.  Avoiding the pitfalls of gene set enrichment analysis with SetRank , 2017, BMC Bioinformatics.

[100]  Wuming Gong,et al.  Dpath software reveals hierarchical haemato-endothelial lineages of Etv2 progenitors based on single-cell transcriptome analysis , 2017, Nature Communications.

[101]  Thomas Höfer,et al.  Robust classification of single-cell transcriptome data by nonnegative matrix factorization , 2017, Bioinform..

[102]  A. Regev,et al.  Revealing the vectors of cellular identity with single-cell genomics , 2016, Nature Biotechnology.

[103]  Martin Wattenberg,et al.  How to Use t-SNE Effectively , 2016 .

[104]  Koji Tsuda,et al.  CellTree: an R/bioconductor package to infer the hierarchical structure of cell populations from single-cell RNA-seq data , 2016, BMC Bioinformatics.

[105]  Davis J. McCarthy,et al.  A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor , 2016, F1000Research.

[106]  Aaron T. L. Lun,et al.  Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R , 2017, Bioinform..

[107]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[108]  Matthew Stephens,et al.  Visualizing the structure of RNA-seq expression data using grade of membership models , 2016, bioRxiv.

[109]  Matthew Stephens,et al.  False discovery rates: a new deal , 2016, bioRxiv.

[110]  J. Mesirov,et al.  The Molecular Signatures Database Hallmark Gene Set Collection , 2015 .

[111]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[112]  George Michailidis,et al.  A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data , 2015, Bioinform..

[113]  J. Lieb,et al.  Single-cell ATAC-seq: strength in numbers , 2015, Genome Biology.

[114]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[115]  Andrew C. Adey,et al.  Multiplex single-cell profiling of chromatin accessibility by combinatorial cellular indexing , 2015, Science.

[116]  Swapan Mallick,et al.  Massive migration from the steppe was a source for Indo-European languages in Europe , 2015, Nature.

[117]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[118]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[119]  Jeffrey T. Leek,et al.  Preserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction , 2014, Bioinform..

[120]  Robert Clarke,et al.  BADGE: A novel Bayesian model for accurate abundance quantification and differential analysis of RNA-Seq data , 2014, BMC Bioinformatics.

[121]  Matt Taddy,et al.  Distributed multinomial regression , 2013, 1311.6139.

[122]  Xiumin Yan,et al.  The Cep63 paralogue Deup1 enables massive de novo centriole biogenesis for vertebrate multiciliogenesis , 2013, Nature Cell Biology.

[123]  Sujoy Ghosh,et al.  Redundancy control in pathway databases (ReCiPa): an application for improving gene-set enrichment analysis in Omics studies and "Big data" biology. , 2013, Omics : a journal of integrative biology.

[124]  P. Laird,et al.  Discovery of multi-dimensional modules by integrative analysis of cancer genomic data , 2012, Nucleic acids research.

[125]  Celia Fontanillo,et al.  Functional Analysis beyond Enrichment: Non-Redundant Reciprocal Linkage of Genes and Biological Terms , 2011, PloS one.

[126]  Inderjit S. Dhillon,et al.  Fast coordinate descent methods with variable selection for non-negative matrix factorization , 2011, KDD.

[127]  Helga Thorvaldsdóttir,et al.  Molecular signatures database (MSigDB) 3.0 , 2011, Bioinform..

[128]  Sean M. Grimmond,et al.  Identification of Anchor Genes during Kidney Development Defines Ontological Relationships, Molecular Subcompartments and Regulatory Pathways , 2011, PloS one.

[129]  Gary D. Bader,et al.  Pathway Commons, a web resource for biological pathway data , 2010, Nucleic Acids Res..

[130]  C. Glass,et al.  Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. , 2010, Molecular cell.

[131]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[132]  Peter N. Robinson,et al.  GOing Bayesian: model-based gene set analysis of genome-scale data , 2010, Nucleic acids research.

[133]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[134]  Renata C. Geer,et al.  The NCBI BioSystems database , 2009, Nucleic Acids Res..

[135]  Andrew Gelman,et al.  Why We (Usually) Don't Have to Worry About Multiple Comparisons , 2009, 0907.2478.

[136]  Fan Wang,et al.  The role of Scgb1a1+ Clara cells in the long-term maintenance and repair of lung airway, but not alveolar, epithelium. , 2009, Cell stem cell.

[137]  Gordon K. Smyth,et al.  Testing significance relative to a fold-change threshold is a TREAT , 2009, Bioinform..

[138]  Mario Medvedovic,et al.  LRpath: a logistic regression approach for identifying enriched biological groups in gene expression data , 2009, Bioinform..

[139]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[140]  I. Simon,et al.  A probabilistic generative model for GO enrichment analysis , 2008, Nucleic acids research.

[141]  Chris H. Q. Ding,et al.  On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing , 2008, Comput. Stat. Data Anal..

[142]  V. Ganapathy,et al.  Cloning and functional characterization of human SMCT2 (SLC5A12) and expression pattern of the transporter in kidney. , 2007, Biochimica et biophysica acta.

[143]  Mark D. Robinson,et al.  Moderated statistical tests for assessing differences in tag abundance , 2007, Bioinform..

[144]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[145]  Tony O’Hagan Bayes factors , 2006 .

[146]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[147]  Éric Gaussier,et al.  Relation between PLSA and NMF and implications , 2005, SIGIR '05.

[148]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[149]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[150]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[151]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[152]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[153]  N. Rosenberg distruct: a program for the graphical display of population structure , 2003 .

[154]  Victoria Stodden,et al.  When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts? , 2003, NIPS.

[155]  M. Feldman,et al.  Genetic Structure of Human Populations , 2002, Science.

[156]  S. Becker-Herman,et al.  Id2 Negatively Regulates B Cell Differentiation in the Spleen1 , 2002, The Journal of Immunology.

[157]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[158]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[159]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[160]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[161]  Ming-Hui Chen,et al.  Monte Carlo Estimation of Bayesian Credible and HPD Intervals , 1999 .

[162]  I. Good Some Statistical Applications of Poisson's Work , 1986 .

[163]  L. Devroye Non-Uniform Random Variate Generation , 1986 .

[164]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[165]  Rhonda L. Bacher,et al.  Analysis of Single-Cell RNA-seq Data. , 2023, Methods in molecular biology.

[166]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[167]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[168]  Ana de Almeida,et al.  Nonnegative Matrix Factorization , 2018 .

[169]  R. Fisher On the Interpretation of χ2 from Contingency Tables, and the Calculation of P , 2010 .

[170]  C. Elkan,et al.  Topic Models , 2008 .

[171]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[172]  Haesun Park,et al.  Sparse Nonnegative Matrix Factorization for Clustering , 2008 .

[173]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[174]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[175]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .