Clustering in graphs and hypergraphs with categorical edge labels

Modern graph or network datasets often contain rich structure that goes beyond simple pairwise connections between nodes. This calls for complex representations that can capture, for instance, edges of different types as well as so-called “higher-order interactions” that involve more than two nodes at a time. However, we have fewer rigorous methods that can provide insight from such representations. Here, we develop a computational framework for the problem of clustering hypergraphs with categorical edge labels — or different interaction types — where clusters corresponds to groups of nodes that frequently participate in the same type of interaction. Our methodology is based on a combinatorial objective function that is related to correlation clustering on graphs but enables the design of much more efficient algorithms that also seamlessly generalize to hypergraphs. When there are only two label types, our objective can be optimized in polynomial time, using an algorithm based on minimum cuts. Minimizing our objective becomes NP-hard with more than two label types, but we develop fast approximation algorithms based on linear programming relaxations that have theoretical cluster quality guarantees. We demonstrate the efficacy of our algorithms and the scope of the model through problems in edge-label community detection, clustering with temporal data, and exploratory data analysis.

[1]  Mihalis Yannakakis,et al.  The Complexity of Multiterminal Cuts , 1994, SIAM J. Comput..

[2]  Lada A. Adamic,et al.  Recipe recommendation using ingredient networks , 2011, WebSci '12.

[3]  Barbora Micenková,et al.  Clustering attributed graphs: Models, measures and methods , 2015, Network Science.

[4]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[5]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[6]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[7]  Mason A. Porter,et al.  Nonlinearity + Networks: A 2020 Vision , 2019, Emerging Frontiers in Nonlinear Science.

[8]  Cristopher Moore,et al.  The Computer Science and Physics of Community Detection: Landscapes, Phase Transitions, and Hardness , 2017, Bull. EATCS.

[9]  Santo Fortunato,et al.  Consensus clustering in complex networks , 2012, Scientific Reports.

[10]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[11]  Bernhard Schölkopf,et al.  Learning with Hypergraphs: Clustering, Classification, and Embedding , 2006, NIPS.

[12]  Nitesh V. Chawla,et al.  metapath2vec: Scalable Representation Learning for Heterogeneous Networks , 2017, KDD.

[13]  Olgica Milenkovic,et al.  Inhomogeneous Hypergraph Clustering with Applications , 2017, NIPS.

[14]  Yuval Rabani,et al.  An improved approximation algorithm for multiway cut , 1998, STOC '98.

[15]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[16]  Daniel Freedman,et al.  Energy minimization via graph cuts: settling what is possible , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  Cedric E. Ginestet,et al.  Cognitive relevance of the community structure of the human brain functional coactivation network , 2013, Proceedings of the National Academy of Sciences.

[18]  Jure Leskovec,et al.  Higher-order organization of complex networks , 2016, Science.

[19]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[20]  Christos Faloutsos,et al.  PICS: Parameter-free Identification of Cohesive Subgroups in Large Attributed Graphs , 2012, SDM.

[21]  Mikolaj Morzy,et al.  Signed Graphs , 2014, Encyclopedia of Social Network Analysis and Mining.

[22]  Jukka-Pekka Onnela,et al.  Community Structure in Time-Dependent, Multiscale, and Multiplex Networks , 2009, Science.

[23]  Jure Leskovec,et al.  Local Higher-Order Graph Clustering , 2017, KDD.

[24]  Ulrik Brandes,et al.  On Modularity Clustering , 2008, IEEE Transactions on Knowledge and Data Engineering.

[25]  Priscilla Parkhurst Ferguson,et al.  What's cooking? , 1995 .

[26]  Albert-László Barabási,et al.  Flavor network and the principles of food pairing , 2011, Scientific reports.

[27]  Jon M. Kleinberg,et al.  Simplicial closure and higher-order link prediction , 2018, Proceedings of the National Academy of Sciences.

[28]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[29]  Christos Faloutsos,et al.  Spotting misbehaviors in location-based social networks using tensors , 2014, WWW.

[30]  Vipin Kumar,et al.  Similarity Measures for Categorical Data: A Comparative Evaluation , 2008, SDM.

[31]  Johannes Gehrke,et al.  CACTUS—clustering categorical data using summaries , 1999, KDD '99.

[32]  Anthony Wirth,et al.  Correlation Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[33]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[34]  Mason A. Porter,et al.  Multilayer networks , 2013, J. Complex Networks.

[35]  Jon M. Kleinberg,et al.  Clustering categorical data: an approach based on dynamical systems , 2000, The VLDB Journal.

[36]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[37]  Yin Tat Lee,et al.  Solving linear programs in the current matrix multiplication time , 2018, STOC.

[38]  James B. Orlin,et al.  Max flows in O(nm) time, or better , 2013, STOC '13.

[39]  Pushmeet Kohli,et al.  Minimizing dynamic and higher order energy functions using graph cuts , 2010 .

[40]  Charu C. Aggarwal,et al.  Graph Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[41]  Olgica Milenkovic,et al.  Motif and Hypergraph Correlation Clustering , 2018, IEEE Transactions on Information Theory.

[42]  Mihalis Yannakakis,et al.  Multiway cuts in node weighted graphs , 2004, J. Algorithms.

[43]  Iftah Gamzu,et al.  Improved Theoretical and Practical Guarantees for Chromatic Correlation Clustering , 2015, WWW.

[44]  Silvio Lattanzi,et al.  Affiliation networks , 2009, STOC '09.

[45]  Renaud Lambiotte,et al.  Simplicial complexes and complex systems , 2018, European Journal of Physics.

[46]  Ryan A. Rossi,et al.  Higher-order Network Representation Learning , 2018, WWW.

[47]  David F. Gleich,et al.  A Correlation Clustering Framework for Community Detection , 2018, WWW.

[48]  Vito Latora,et al.  Structural reducibility of multilayer networks , 2015, Nature Communications.

[49]  Serge J. Belongie,et al.  Higher order learning with graphs , 2006, ICML.

[50]  Vladimir Kolmogorov,et al.  What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Hong Cheng,et al.  A model-based approach to attributed graph clustering , 2012, SIGMOD Conference.

[52]  Austin R. Benson Three hypergraph eigenvector centralities , 2018, SIAM J. Math. Data Sci..

[53]  Tamara G. Kolda,et al.  Link Prediction on Evolving Data Using Matrix and Tensor Factorizations , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[54]  Yang Song,et al.  An Overview of Microsoft Academic Service (MAS) and Applications , 2015, WWW.

[55]  Hoang Dau,et al.  Motif clustering and overlapping clustering for social network analysis , 2016, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[56]  Charalampos E. Tsourakakis,et al.  Chromatic Correlation Clustering , 2015, TKDD.

[57]  Satu Elisa Schaeffer,et al.  Graph Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[58]  Yin Tat Lee,et al.  Efficient Inverse Maintenance and Faster Algorithms for Linear Programming , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[59]  David F. Gleich,et al.  A Simple and Strongly-Local Flow-Based Method for Cut Improvement , 2016, ICML.

[60]  Takuro Fukunaga LP-Based Pivoting Algorithm for Higher-Order Correlation Clustering , 2018, COCOON.

[61]  David F. Gleich,et al.  Correlation Clustering Generalized , 2018, ISAAC.

[62]  Inderjit S. Dhillon,et al.  Weighted Graph Cuts without Eigenvectors A Multilevel Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Desmond J. Higham,et al.  A framework for second-order eigenvector centralities and clustering coefficients , 2020, Proceedings of the Royal Society A.

[64]  J. Kleinberg,et al.  Networks, Crowds, and Markets , 2010 .

[65]  Justine Zhang,et al.  Characterizing Online Public Discussions through Patterns of Participant Interactions , 2018, Proc. ACM Hum. Comput. Interact..

[66]  Kathleen M. Carley,et al.  Patterns and dynamics of users' behavior and interaction: Network analysis of an online community , 2009, J. Assoc. Inf. Sci. Technol..

[67]  Dorothea Wagner,et al.  Between Min Cut and Graph Bisection , 1993, MFCS.

[68]  Bei Wang,et al.  Spectral sparsification of simplicial complexes for clustering and label propagation , 2017, J. Comput. Geom..