Tree Sampling Divergence: An Information-Theoretic Metric for Hierarchical Graph Clustering

We introduce the tree sampling divergence (TSD), an information-theoretic metric for assessing the quality of the hierarchical clustering of a graph. Any hierarchical clustering of a graph can be represented as a tree whose nodes correspond to clusters of the graph. The TSD is the Kullback-Leibler divergence between two probability distributions over the nodes of this tree: those induced respectively by sampling at random edges and node pairs of the graph. A fundamental property of the proposed metric is that it is interpretable in terms of graph reconstruction. Specifically, it quantifies the ability to reconstruct the graph from the tree in terms of information loss. In particular, the TSD is maximum when perfect reconstruction is feasible, i.e., when the graph has a hierarchical structure and can be reconstructed exactly from the corresponding tree. Another key property of TSD is that it applies to any tree, not necessarily binary. In particular, the TSD applies to trees of height 2, corresponding to the case of usual clustering (not hierarchical) whose output is a partition of the set of nodes. The TSD can thus be viewed as a universal metric, applicable to any type of clustering. Moreover, the TSD can be used in practice to compress a binary tree while minimizing the information loss in terms of graph reconstruction, so as to get a compact representation of the hierarchical structure of a graph. We illustrate the behavior of TSD compared to existing metrics on experiments based on both synthetic and real datasets.

[1]  Jay Cheng,et al.  A general probabilistic framework for detecting community structure in networks , 2011, 2011 Proceedings IEEE INFOCOM.

[2]  Roger Guimerà,et al.  Extracting the hierarchical organization of complex systems , 2007, Proceedings of the National Academy of Sciences.

[3]  Aurko Roy,et al.  Hierarchical Clustering via Spreading Metrics , 2016, NIPS.

[4]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Andrea Lancichinetti,et al.  Detecting the overlapping and hierarchical community structure in complex networks , 2008, 0802.1218.

[6]  Silvio Lattanzi,et al.  Affinity Clustering: Hierarchical Clustering at Scale , 2017, NIPS.

[7]  Douwe Kiela,et al.  Poincaré Embeddings for Learning Hierarchical Representations , 2017, NIPS.

[8]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[9]  Varun Kanade,et al.  Hierarchical Clustering Beyond the Worst-Case , 2017, NIPS.

[10]  Sanjoy Dasgupta,et al.  A cost function for similarity-based hierarchical clustering , 2015, STOC.

[11]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[12]  Ali Jadbabaie,et al.  IEEE Transactions on Network Science and Engineering , 2014, IEEE Trans. Netw. Sci. Eng..

[13]  Stephan Günnemann,et al.  Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking , 2017, ICLR.

[14]  Claire Mathieu,et al.  Hierarchical Clustering , 2017, SODA.

[15]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[17]  Carey E. Priebe,et al.  Community Detection and Classification in Hierarchical Stochastic Blockmodels , 2015, IEEE Transactions on Network Science and Engineering.

[18]  Guido Caldarelli,et al.  Large Scale Structure and Dynamics of Complex Networks: From Information Technology to Finance and Natural Science , 2007 .

[19]  Yizhou Sun,et al.  SHRINK: a structural clustering algorithm for detecting hierarchical communities in networks , 2010, CIKM.

[20]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[21]  Lorenzo Rosasco,et al.  Holographic Embeddings of Knowledge Graphs , 2015, AAAI.