Unsupervised Embedding of Hierarchical Structure in Euclidean Space

Deep embedding methods have influenced many areas of unsupervised learning. However, the best methods for learning hierarchical structure use non-Euclidean representations, whereas Euclidean geometry underlies the theory behind many hierarchical clustering algorithms. To bridge the gap between these two areas, we consider learning a non-linear embedding of data into Euclidean space as a way to improve the hierarchical clustering produced by agglomerative algorithms. To learn the embedding, we revisit using a variational autoencoder with a Gaussian mixture prior, and we show that rescaling the latent space embedding and then applying Ward's linkage-based algorithm leads to improved results for both dendrogram purity and the Moseley-Wang cost function. Finally, we complement our empirical results with a theoretical explanation of the success of this approach. We study a synthetic model of the embedded vectors and prove that Ward's method exactly recovers the planted hierarchical clustering with high probability.

[1]  Albert Gu,et al.  From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering , 2020, NeurIPS.

[2]  Amin Karbasi,et al.  Comparison Based Learning from Weak Oracles , 2018, AISTATS.

[3]  Giovanni Chierchia,et al.  Ultrametric fitting by gradient descent , 2019, NeurIPS.

[4]  Jingbo Shang,et al.  NetTaxo: Automated Topic Taxonomy Construction from Text-Rich Network , 2020, WWW.

[5]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[6]  Qiang Liu,et al.  A Survey of Clustering With Deep Learning: From the Perspective of Network Architecture , 2018, IEEE Access.

[7]  Huachun Tan,et al.  Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering , 2016, IJCAI.

[8]  Xiaopeng Li,et al.  Learning Latent Superstructures in Variational Autoencoders for Deep Multidimensional Clustering , 2018, ICLR.

[9]  Sanjoy Dasgupta,et al.  A cost function for similarity-based hierarchical clustering , 2015, STOC.

[10]  Gábor J. Székely,et al.  Hierarchical Clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method , 2005, J. Classif..

[11]  Éric Gaussier,et al.  Deep k-Means: Jointly Clustering with k-Means and Learning Representations , 2018, Pattern Recognit. Lett..

[12]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[13]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[14]  Murray Shanahan,et al.  Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders , 2016, ArXiv.

[15]  Heiko Röglin,et al.  Analysis of Ward's Method , 2019, SODA.

[16]  Robert D. Nowak,et al.  Low-dimensional embedding using adaptively selected ordinal data , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[17]  Andrew McCallum,et al.  Supervised Hierarchical Clustering with Exponential Linkage , 2019, ICML.

[18]  Shweta Sharma,et al.  Comparative Study of Single Linkage, Complete Linkage, and Ward Method of Agglomerative Clustering , 2019, 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon).

[19]  Peter H. A. Sneath,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification , 1973 .

[20]  F. Chung,et al.  Complex Graphs and Networks , 2006 .

[21]  Oliver Nina,et al.  A Decoder-Free Approach for Unsupervised Clustering and Manifold Learning with Random Triplet Mining , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[22]  Lars Hertel,et al.  Approximate Inference for Deep Latent Gaussian Mixtures , 2016 .

[23]  Max Welling,et al.  VAE with a VampPrior , 2017, AISTATS.

[24]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[25]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[26]  Yee Whye Teh,et al.  Continuous Hierarchical Representations with Poincaré Variational Auto-Encoders , 2019, NeurIPS.

[27]  Thomas Hofmann,et al.  Hyperbolic Entailment Cones for Learning Hierarchical Embeddings , 2018, ICML.

[28]  Katherine A. Heller,et al.  Bayesian hierarchical clustering , 2005, ICML.

[30]  Philip M. Long,et al.  Performance guarantees for hierarchical clustering , 2002, J. Comput. Syst. Sci..

[31]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[32]  Nuno Vasconcelos,et al.  Learning Mixture Hierarchies , 1998, NIPS.

[33]  Massih-Reza Amini,et al.  Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization , 2009, NIPS.

[34]  Bo Zhang,et al.  Discriminatively Boosted Image Clustering with Fully Convolutional Auto-Encoders , 2017, Pattern Recognit..

[35]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[36]  Sara Ahmadian,et al.  Bisect and Conquer: Hierarchical Clustering via Max-Uncut Bisection , 2019, AISTATS.

[37]  Akshay Krishnamurthy,et al.  An Online Hierarchical Algorithm for Extreme Clustering , 2017, ArXiv.

[38]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[39]  Cesar H. Comin,et al.  Revisiting Agglomerative Clustering , 2020, Physica A: Statistical Mechanics and its Applications.

[40]  Gary Bécigneul,et al.  Poincaré GloVe: Hyperbolic Word Embeddings , 2018, ICLR.

[41]  Patrick van der Smagt,et al.  Learning Hierarchical Priors in VAEs , 2019, NeurIPS.

[42]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[43]  Maurice Roux,et al.  A Comparative Study of Divisive and Agglomerative Hierarchical Clustering Algorithms , 2018, Journal of Classification.

[44]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[45]  Beatriz de la Iglesia,et al.  Clustering Rules: A Comparison of Partitioning and Hierarchical Clustering Algorithms , 2006, J. Math. Model. Algorithms.

[46]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[47]  Anna C. Gilbert,et al.  Tree! I am no Tree! I am a Low Dimensional Hyperbolic Embedding , 2020, NeurIPS.

[48]  Benjamin King Step-Wise Clustering Procedures , 1967 .

[49]  Douwe Kiela,et al.  Poincaré Embeddings for Learning Hierarchical Representations , 2017, NIPS.

[50]  Rajmohan Rajaraman,et al.  A general approach for incremental approximation and hierarchical clustering , 2006, SODA '06.

[51]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[52]  En Zhu,et al.  Deep Clustering with Convolutional Autoencoders , 2017, ICONIP.

[53]  Christopher De Sa,et al.  Representation Tradeoffs for Hyperbolic Embeddings , 2018, ICML.

[54]  Joshua B. Tenenbaum,et al.  One-Shot Learning with a Hierarchical Nonparametric Bayesian Model , 2011, ICML Unsupervised and Transfer Learning.

[55]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[56]  Rik Sarkar,et al.  Low Distortion Delaunay Embedding of Trees in Hyperbolic Plane , 2011, GD.

[57]  Maria-Florina Balcan,et al.  Learning to Link , 2020, ICLR.

[58]  Akshay Krishnamurthy,et al.  A Hierarchical Algorithm for Extreme Clustering , 2017, KDD.

[59]  David Kempe,et al.  Adaptive Hierarchical Clustering Using Ordinal Queries , 2017, SODA.

[60]  Alexander Cloninger,et al.  Diffusion Nets , 2015, Applied and Computational Harmonic Analysis.

[61]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[62]  Benjamin Moseley,et al.  Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search , 2017, NIPS.

[63]  Jacob Goldberger,et al.  Hierarchical Clustering of a Mixture Model , 2004, NIPS.

[64]  David J. Kriegman,et al.  Generalized Non-metric Multidimensional Scaling , 2007, AISTATS.

[65]  Frederic Sala,et al.  Learning Mixed-Curvature Representations in Product Spaces , 2018, ICLR.

[66]  Ulrike von Luxburg,et al.  Pruning nearest neighbor cluster trees , 2011, ICML.

[67]  Varun Kanade,et al.  Hierarchical Clustering Beyond the Worst-Case , 2017, NIPS.

[68]  Niema Moshiri,et al.  TreeCluster: clustering biological sequences using phylogenetic trees , 2019 .

[69]  Sivaraman Balakrishnan,et al.  Noise Thresholds for Spectral Clustering , 2011, NIPS.

[70]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[71]  Avrim Blum,et al.  Foundations of Data Science , 2020 .

[72]  Yee Whye Teh,et al.  Bayesian Agglomerative Clustering with Coalescents , 2007, NIPS.

[73]  Grigory Yaroslavtsev,et al.  Hierarchical Clustering for Euclidean Data , 2018, AISTATS.

[74]  Yoni Choukroun,et al.  Deep Discriminative Latent Space for Clustering , 2018, ArXiv.

[75]  Andrew McCallum,et al.  Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space , 2019, KDD.

[76]  Christian Sohler,et al.  Analysis of Agglomerative Clustering , 2010, Algorithmica.

[77]  Moses Charikar,et al.  Approximate Hierarchical Clustering via Sparsest Cut and Spreading Metrics , 2016, SODA.

[78]  Daniel Cremers,et al.  Clustering with Deep Learning: Taxonomy and New Methods , 2018, ArXiv.

[79]  Brian M. Sadler,et al.  TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering , 2018, KDD.

[80]  Eric P. Xing,et al.  Nonparametric Variational Auto-Encoders for Hierarchical Representation Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[81]  Kyungwoo Song,et al.  Hierarchically Clustered Representation Learning , 2019, AAAI.

[82]  Bo Yang,et al.  Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering , 2016, ICML.

[83]  Abdellatif Zaidi,et al.  Variational Information Bottleneck for Unsupervised Clustering: Deep Gaussian Mixture Embedding , 2019, Entropy.

[84]  Marco Cote STICK-BREAKING VARIATIONAL AUTOENCODERS , 2017 .

[85]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[86]  Amir Abboud,et al.  Subquadratic High-Dimensional Hierarchical Clustering , 2019, NeurIPS.