Probabilistic Dimensionality Reduction via Structure Learning

We propose an alternative probabilistic dimensionality reduction framework that can naturally integrate the generative model and the locality information of data. Based on this framework, we present a new model, which is able to learn a set of embedding points in a low-dimensional space by retaining the inherent structure from high-dimensional data. The objective function of this new model can be equivalently interpreted as two coupled learning problems, i.e., structure learning and the learning of projection matrix. Inspired by this interesting interpretation, we propose another model, which finds a set of embedding points that can directly form an explicit graph structure. We proved that the model by learning explicit graphs generalizes the reversed graph embedding method, but leads to a natural interpretation from Bayesian perspective. This can greatly facilitate data visualization and scientific discovery in downstream analysis. Extensive experiments are performed that demonstrate that the proposed framework is able to retain the inherent structure of datasets and achieve competitive quantitative results in terms of various performance evaluation criteria.

[1]  Miroslav Dudík,et al.  Maximum Entropy Density Estimation with Generalized Regularization and an Application to Species Distribution Modeling , 2007, J. Mach. Learn. Res..

[2]  A. Rukhin Matrix Variate Distributions , 1999, The Multivariate Normal Distribution.

[3]  Neil D. Lawrence,et al.  Bayesian Gaussian Process Latent Variable Model , 2010, AISTATS.

[4]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[5]  F. Markowetz,et al.  The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups , 2012, Nature.

[6]  Ivor W. Tsang,et al.  Latent Smooth Skeleton Embedding , 2017, AAAI.

[7]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[8]  Christopher J. C. Burges,et al.  Dimension Reduction: A Guided Tour , 2010, Found. Trends Mach. Learn..

[9]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[10]  Joshua B. Tenenbaum,et al.  Discovering Structure by Learning Sparse Graphs , 2010 .

[11]  J. Salk Clonal evolution in cancer , 2010 .

[12]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[13]  Ning Chen,et al.  Bayesian inference with posterior regularization and applications to infinite latent SVMs , 2012, J. Mach. Learn. Res..

[14]  R. Tibshirani Principal curves revisited , 1992 .

[15]  Ivor W. Tsang,et al.  Generalized Multiple Kernel Learning With Data-Dependent Priors , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[17]  YanShuicheng,et al.  Learning with l1-graph for image analysis , 2010 .

[18]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[19]  M. Naderi Think globally... , 2004, HIV prevention plus!.

[20]  张振跃,et al.  Principal Manifolds and Nonlinear Dimensionality Reduction via Tangent Space Alignment , 2004 .

[21]  A. Nobel,et al.  Supervised risk predictor of breast cancer based on intrinsic subtypes. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[22]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[23]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[25]  Neil D. Lawrence,et al.  A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models , 2010, J. Mach. Learn. Res..

[26]  Le Song,et al.  A dependence maximization view of clustering , 2007, ICML '07.

[27]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[28]  Qi Mao,et al.  Feature selection for unsupervised learning through local learning , 2015, Pattern Recognit. Lett..

[29]  Kilian Q. Weinberger,et al.  Learning a kernel matrix for nonlinear dimensionality reduction , 2004, ICML.

[30]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[31]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[32]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[33]  Alexandros Nanopoulos,et al.  Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data , 2010, J. Mach. Learn. Res..

[34]  Francesco Masulli,et al.  A survey of kernel and spectral methods for clustering , 2008, Pattern Recognit..

[35]  Laura Schweitzer,et al.  Advances In Kernel Methods Support Vector Learning , 2016 .

[36]  Stephen P. Boyd,et al.  A duality view of spectral methods for dimensionality reduction , 2006, ICML.

[37]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[38]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[39]  Ann B. Lee,et al.  Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[41]  Michel Verleysen,et al.  Quality assessment of dimensionality reduction: Rank-based criteria , 2009, Neurocomputing.

[42]  René Vidal,et al.  Sparse Manifold Clustering and Embedding , 2011, NIPS.

[43]  Li Wang,et al.  SimplePPT: A Simple Principal Tree Algorithm , 2015, SDM.

[44]  Steve Goodison,et al.  Cancer progression modeling using static sample data , 2014, Genome Biology.

[45]  Xin Jin,et al.  Mean Shift , 2017, Encyclopedia of Machine Learning and Data Mining.

[46]  Adam Krzyzak,et al.  Learning and Design of Principal Curves , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Li Wang,et al.  Dimensionality Reduction Via Graph Structure Learning , 2015, KDD.

[48]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[49]  Kilian Q. Weinberger,et al.  An Introduction to Nonlinear Dimensionality Reduction by Maximum Variance Unfolding , 2006, AAAI.

[50]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[51]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[52]  Lawrence Cayton,et al.  Algorithms for manifold learning , 2005 .

[53]  Li Wang,et al.  Reversed graph embedding resolves complex single-cell developmental trajectories , 2017, bioRxiv.

[54]  Ulrike von Luxburg,et al.  Influence of graph construction on graph-based clustering measures , 2008, NIPS.

[55]  Eamonn J. Keogh Nearest Neighbor , 2010, Encyclopedia of Machine Learning.

[56]  I. Jolliffe Principal Component Analysis , 2002 .

[57]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[58]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[59]  ChengYizong Mean Shift, Mode Seeking, and Clustering , 1995 .

[60]  Shih-Fu Chang,et al.  Graph construction and b-matching for semi-supervised learning , 2009, ICML '09.