A Color-blind 3-Approximation for Chromatic Correlation Clustering and Improved Heuristics

Chromatic Correlation Clustering (CCC) models clustering of objects with categorical pairwise relationships. The model can be viewed as clustering the vertices of a graph with edge-labels (colors). Bonchi et al. [KDD 2012] introduced it as a natural generalization of the well studied problem Correlation Clustering (CC), motivated by real-world applications from data-mining, social networks and bioinformatics. We give theoretical as well as practical contributions to the study of CCC. Our main theoretical contribution is an alternative analysis of the famous Pivot algorithm for CC. We show that, when simply run color-blind, Pivot is also a linear time 3-approximation for CCC. The previous best theoretical results for CCC were a 4-approximation with a high-degree polynomial runtime and a linear time 11-approximation, both by Anava et al. [WWW 2015]. While this theoretical result justifies Pivot as a baseline comparison for other heuristics, its blunt color-blindness performs poorly in practice. We develop a color-sensitive, practical heuristic we call Greedy Expansion that empirically outperforms all heuristics proposed for CCC so far, both on real-world and synthetic instances. Further, we propose a novel generalization of CCC allowing for multi-labelled edges. We argue that it is more suitable for many of the real-world applications and extend our results to this model.

[1]  Vito Latora,et al.  Structural reducibility of multilayer networks , 2015, Nature Communications.

[2]  Venkatesan Guruswami,et al.  Clustering with qualitative information , 2005, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[3]  Yang Song,et al.  An Overview of Microsoft Academic Service (MAS) and Applications , 2015, WWW.

[4]  Andrzej Lingas,et al.  Iterative merging heuristics for correlation clustering , 2014, Int. J. Metaheuristics.

[5]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[6]  Amos Fiat,et al.  Correlation clustering in general weighted graphs , 2006, Theor. Comput. Sci..

[7]  Claire Gardent,et al.  Improving Machine Learning Approaches to Coreference Resolution , 2002, ACL.

[8]  Iftah Gamzu,et al.  Improved Theoretical and Practical Guarantees for Chromatic Correlation Clustering , 2015, WWW.

[9]  Tselil Schramm,et al.  Near Optimal LP Rounding Algorithm for CorrelationClustering on Complete and Complete k-partite Graphs , 2014, STOC.

[10]  Micha Elsner,et al.  You Talking to Me? A Corpus and Algorithm for Conversation Disentanglement , 2008, ACL.

[11]  Ravi Kumar,et al.  Correlation clustering in MapReduce , 2014, KDD.

[12]  Damian Szklarczyk,et al.  STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets , 2018, Nucleic Acids Res..

[13]  Hwee Tou Ng,et al.  A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[14]  Nir Ailon,et al.  Aggregating inconsistent information: Ranking and clustering , 2008 .

[15]  Austin R. Benson,et al.  Clustering in graphs and hypergraphs with categorical edge labels , 2020, WWW.

[16]  Charalampos E. Tsourakakis,et al.  Chromatic Correlation Clustering , 2015, TKDD.

[17]  Anthony Wirth,et al.  Correlation Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[18]  A. F. Cardenas,et al.  Visualization and Clustering of Author Social Networks , 2007 .

[19]  Priscilla Parkhurst Ferguson,et al.  What's cooking? , 1995 .

[20]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..

[21]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[22]  M. Elsner,et al.  Bounding and Comparing Methods for Correlation Clustering Beyond ILP , 2009, ILP 2009.