From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering

Similarity-based Hierarchical Clustering (HC) is a classical unsupervised machine learning algorithm that has traditionally been solved with heuristic algorithms like Average-Linkage. Recently, Dasgupta reframed HC as a discrete optimization problem by introducing a global cost function measuring the quality of a given tree. In this work, we provide the first continuous relaxation of Dasgupta's discrete optimization problem with provable quality guarantees. The key idea of our method, HypHC, is showing a direct correspondence from discrete trees to continuous representations (via the hyperbolic embeddings of their leaf nodes) and back (via a decoding algorithm that maps leaf embeddings to a dendrogram), allowing us to search the space of discrete binary trees with continuous optimization. Building on analogies between trees and hyperbolic space, we derive a continuous analogue for the notion of lowest common ancestor, which leads to a continuous relaxation of Dasgupta's discrete objective. We can show that after decoding, the global minimizer of our continuous relaxation yields a discrete tree with a (1 + epsilon)-factor approximation for Dasgupta's optimal tree, where epsilon can be made arbitrarily small and controls optimization challenges. We experimentally evaluate HypHC on a variety of HC benchmarks and find that even approximate solutions found with gradient descent have superior clustering quality than agglomerative heuristics or other gradient based algorithms. Finally, we highlight the flexibility of HypHC using end-to-end training in a downstream classification task.

[1]  Douwe Kiela,et al.  Hyperbolic Graph Neural Networks , 2019, NeurIPS.

[2]  Moses Charikar,et al.  Hierarchical Clustering better than Average-Linkage , 2019, SODA.

[3]  Rik Sarkar,et al.  Low Distortion Delaunay Embedding of Trees in Hyperbolic Plane , 2011, GD.

[4]  Ryan Williams,et al.  Probabilistic Polynomials and Hamming Nearest Neighbors , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[5]  Gary Bécigneul,et al.  Riemannian Adaptive Optimization Methods , 2018, ICLR.

[6]  Christopher De Sa,et al.  Representation Tradeoffs for Hyperbolic Embeddings , 2018, ICML.

[7]  Akshay Krishnamurthy,et al.  Gradient-based Hierarchical Clustering , 2017 .

[8]  Grigory Yaroslavtsev,et al.  Hierarchical Clustering for Euclidean Data , 2018, AISTATS.

[9]  Claire Mathieu,et al.  Hierarchical Clustering , 2017, SODA.

[10]  Philip M. Long,et al.  Breast cancer classification and prognosis based on gene expression profiles from a population-based study , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Douwe Kiela,et al.  Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry , 2018, ICML.

[12]  Amin Vahdat,et al.  Hyperbolic Geometry of Complex Networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Varun Kanade,et al.  Hierarchical Clustering Beyond the Worst-Case , 2017, NIPS.

[15]  Robert D. Kleinberg Geographic Routing Using Hyperbolic Space , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[16]  Dingkang Wang,et al.  An Improved Cost Function for Hierarchical Cluster Trees , 2018, J. Comput. Geom..

[17]  Venkatesan Guruswami,et al.  Clustering with qualitative information , 2005, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[18]  Christopher R'e,et al.  Machine Learning on Graphs: A Model and Comprehensive Taxonomy , 2020, ArXiv.

[19]  Silvio Lattanzi,et al.  Affinity Clustering: Hierarchical Clustering at Scale , 2017, NIPS.

[20]  Christopher R'e,et al.  Low-Dimensional Hyperbolic Knowledge Graph Embeddings , 2020, ACL.

[21]  Anthony Wirth,et al.  Correlation Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[22]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[23]  Amir Abboud,et al.  Subquadratic High-Dimensional Hierarchical Clustering , 2019, NeurIPS.

[24]  Aurko Roy,et al.  Hierarchical Clustering via Spreading Metrics , 2016, NIPS.

[25]  Pasin Manurangsi,et al.  On Closest Pair in Euclidean Metric: Monochromatic is as Hard as Bichromatic , 2018, Combinatorica.

[26]  Katherine A. Heller,et al.  Bayesian hierarchical clustering , 2005, ICML.

[27]  Thomas Hofmann,et al.  Hyperbolic Neural Networks , 2018, NeurIPS.

[28]  Philip M. Long,et al.  Performance guarantees for hierarchical clustering , 2002, J. Comput. Syst. Sci..

[29]  Moses Charikar,et al.  Hierarchical Clustering with Structural Constraints , 2018, ICML.

[30]  Benjamin Moseley,et al.  Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search , 2017, NIPS.

[31]  Sanjoy Dasgupta,et al.  A cost function for similarity-based hierarchical clustering , 2015, STOC.

[32]  Noga Alon,et al.  Hierarchical Clustering: a 0.585 Revenue Approximation , 2020, COLT.

[33]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[34]  Giovanni Chierchia,et al.  Ultrametric fitting by gradient descent , 2019, NeurIPS.

[35]  Brian H. Bowditch,et al.  A Course On Geometric Group Theory , 2006 .

[36]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[37]  Mark Crovella,et al.  Hyperbolic Embedding and Routing for Dynamic Graphs , 2009, IEEE INFOCOM 2009.

[38]  Sara Ahmadian,et al.  Bisect and Conquer: Hierarchical Clustering via Max-Uncut Bisection , 2019, AISTATS.

[39]  A. O. Houcine On hyperbolic groups , 2006 .

[40]  Marián Boguñá,et al.  Popularity versus similarity in growing networks , 2011, Nature.

[41]  Andrew McCallum,et al.  Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space , 2019, KDD.

[42]  Moses Charikar,et al.  Approximate Hierarchical Clustering via Sparsest Cut and Spreading Metrics , 2016, SODA.

[43]  Maureen T. Carroll Geometry , 2017 .

[44]  Rasul Karimov,et al.  Geoopt: Riemannian Optimization in PyTorch , 2020, ArXiv.

[45]  Douwe Kiela,et al.  Poincaré Embeddings for Learning Hierarchical Representations , 2017, NIPS.

[46]  Jure Leskovec,et al.  Hyperbolic Graph Convolutional Neural Networks , 2019, NeurIPS.