Principal Graph and Structure Learning Based on Reversed Graph Embedding

Many scientific datasets are of high dimension, and the analysis usually requires retaining the most important structures of data. Principal curve is a widely used approach for this purpose. However, many existing methods work only for data with structures that are mathematically formulated by curves, which is quite restrictive for real applications. A few methods can overcome the above problem, but they either require complicated human-made rules for a specific task with lack of adaption flexibility to different tasks, or cannot obtain explicit structures of data. To address these issues, we develop a novel principal graph and structure learning framework that captures the local information of the underlying graph structure based on reversed graph embedding. As showcases, models that can learn a spanning tree or a weighted undirected <inline-formula><tex-math notation="LaTeX">$\ell _1$</tex-math><alternatives> <inline-graphic xlink:href="mao-ieq1-2635657.gif"/></alternatives></inline-formula> graph are proposed, and a new learning algorithm is developed that learns a set of principal points and a graph structure from data, simultaneously. The new algorithm is simple with guaranteed convergence. We then extend the proposed framework to deal with large-scale data. Experimental results on various synthetic and six real world datasets show that the proposed method compares favorably with baselines and can uncover the underlying structure correctly.

[1]  Michael Löwe,et al.  Algebraic Approach to Single-Pushout Graph Transformation , 1993, Theor. Comput. Sci..

[2]  F. Markowetz,et al.  The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups , 2012, Nature.

[3]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[4]  Kathrin Klamroth,et al.  Biconvex sets and optimization with biconvex functions: a survey and extensions , 2007, Math. Methods Oper. Res..

[5]  René Vidal,et al.  Sparse Manifold Clustering and Embedding , 2011, NIPS.

[6]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[7]  Deniz Erdogmus,et al.  Locally Defined Principal Curves and Surfaces , 2011, J. Mach. Learn. Res..

[8]  Kilian Q. Weinberger,et al.  An Introduction to Nonlinear Dimensionality Reduction by Maximum Variance Unfolding , 2006, AAAI.

[9]  Ivor W. Tsang,et al.  Parameter-Free Spectral Kernel Learning , 2010, UAI.

[10]  Gunnar E. Carlsson,et al.  Topology and data , 2009 .

[11]  Le Song,et al.  A dependence maximization view of clustering , 2007, ICML '07.

[12]  Joshua B. Tenenbaum,et al.  Discovering Structure by Learning Sparse Graphs , 2010 .

[13]  Shuicheng Yan,et al.  Learning With $\ell ^{1}$-Graph for Image Analysis , 2010, IEEE Transactions on Image Processing.

[14]  Li Wang,et al.  SimplePPT: A Simple Principal Tree Algorithm , 2015, SDM.

[15]  Ivor W. Tsang,et al.  Latent Smooth Skeleton Embedding , 2017, AAAI.

[16]  Manfred Nagl Formal languages of labelled graphs , 2005, Computing.

[17]  Robert Tibshirani,et al.  A Framework for Feature Selection in Clustering , 2010, Journal of the American Statistical Association.

[18]  Alexander N. Gorban,et al.  Principal Manifolds and Graphs in Practice: from Molecular Biology to Dynamical Systems , 2010, Int. J. Neural Syst..

[19]  A. N. Gorbana,et al.  Topological grammars for data approximation , 2006 .

[20]  R. Tibshirani Principal curves revisited , 1992 .

[21]  Klaus Schulten,et al.  Self-organizing maps: ordering, convergence properties and energy functions , 1992, Biological Cybernetics.

[22]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Ling Huang,et al.  Fast approximate spectral clustering , 2009, KDD.

[24]  T. Hastie,et al.  Principal Curves , 2007 .

[25]  Alexander N. Gorban,et al.  Elastic Principal Graphs and Manifolds and their Practical Applications , 2005, Computing.

[26]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[27]  Li Wang,et al.  Dimensionality Reduction Via Graph Structure Learning , 2015, KDD.

[28]  Deniz Erdogmus,et al.  Local conditions for critical and principal manifolds , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  Knud D. Andersen,et al.  The Mosek Interior Point Optimizer for Linear Programming: An Implementation of the Homogeneous Algorithm , 2000 .

[30]  R Algorithm Abhilasha,et al.  Minimum Cost Spanning Tree Using Prim's , 2013 .

[31]  Adam Krzyzak,et al.  Piecewise Linear Skeletonization Using Principal Curves , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Facundo Mémoli,et al.  Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition , 2007, PBG@Eurographics.

[33]  Xinlei Chen,et al.  Large Scale Spectral Clustering with Landmark-Based Representation , 2011, AAAI.

[34]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[35]  Wenjie Fu,et al.  Fingerprint minutiae extraction based on principal curves , 2007, Pattern Recognit. Lett..

[36]  S. Kulkarni,et al.  Principal curves with bounded turn , 2000, 2000 IEEE International Symposium on Information Theory (Cat. No.00CH37060).

[37]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[39]  Tamal K. Dey,et al.  Reeb Graphs: Approximation and Persistence , 2011, SoCG '11.

[40]  A. Nobel,et al.  Supervised risk predictor of breast cancer based on intrinsic subtypes. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[41]  Ivor W. Tsang,et al.  A unified probabilistic framework for robust manifold learning and embedding , 2017, Machine Learning.

[42]  Alexander N. Gorban,et al.  Principal Graphs and Manifolds , 2008, ArXiv.

[43]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[44]  Carlo C. Maley,et al.  Clonal evolution in cancer , 2012, Nature.

[45]  Mikhail Belkin,et al.  Data Skeletonization via Reeb Graphs , 2011, NIPS.

[46]  Shih-Fu Chang,et al.  Graph construction and b-matching for semi-supervised learning , 2009, ICML '09.

[47]  Ivor W. Tsang,et al.  A Family of Simple Non-Parametric Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[48]  Miguel Á. Carreira-Perpiñán,et al.  A review of mean-shift algorithms for clustering , 2015, ArXiv.

[49]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[50]  Andrei Zinovyev,et al.  Visualization of Any Data with Elastic Map Method , 2001 .

[51]  Adam Krzyzak,et al.  Learning and Design of Principal Curves , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[52]  Alexander Gorban,et al.  ELASTIC PRINCIPAL MANIFOLDS AND THEIR PRACTICAL APPLICATIONS , 2004 .

[53]  Ulrike von Luxburg,et al.  Influence of graph construction on graph-based clustering measures , 2008, NIPS.

[54]  Ameet Talwalkar,et al.  Sampling Methods for the Nyström Method , 2012, J. Mach. Learn. Res..

[55]  Chad J Creighton,et al.  The molecular profile of luminal B breast cancer , 2012, Biologics : targets & therapy.

[56]  R. C. Williamson,et al.  Regularized principal manifolds , 2001 .

[57]  Steve Goodison,et al.  Cancer progression modeling using static sample data , 2014, Genome Biology.

[58]  G. Carlsson,et al.  Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival , 2011, Proceedings of the National Academy of Sciences.