Highly Efficient Mining of Overlapping Clusters in Signed Weighted Networks

In many practical contexts, networks are weighted as their links are assigned numerical weights representing relationship strengths or intensities of inter-node interaction. Moreover, the links' weight can be positive or negative, depending on the relationship or interaction between the connected nodes. The existing methods for network clustering however are not ideal for handling very large signed weighted networks. In this paper, we present a novel method called LPOCSIN (short for "Linear Programming based Overlapping Clustering on Signed Weighted Networks") for efficient mining of overlapping clusters in signed weighted networks. Different from existing methods that rely on computationally expensive cluster cohesiveness measures, LPOCSIN utilizes a simple yet effective one. Using this measure, we transform the cluster assignment problem into a series of alternating linear programs, and further propose a highly efficient procedure for solving those alternating problems. We evaluate LPOCSIN and other state-of-the-art methods by extensive experiments covering a wide range of synthetic and real networks. The experiments show that LPOCSIN significantly outperforms the other methods in recovering ground-truth clusters while being an order of magnitude faster than the most efficient state-of-the-art method.

[1]  K. E. Read,et al.  Cultures of the Central Highlands, New Guinea , 1954, Southwestern Journal of Anthropology.

[2]  Haiyuan Yu,et al.  Detecting overlapping protein complexes in protein-protein interaction networks , 2012, Nature Methods.

[3]  Jure Leskovec,et al.  Community-Affiliation Graph Model for Overlapping Network Community Detection , 2012, 2012 IEEE 12th International Conference on Data Mining.

[4]  David B. Skillicorn,et al.  Spectral Embedding of Signed Networks , 2015, SDM.

[5]  Xiao-Fei Zhang,et al.  Detecting Protein Complexes from Signed Protein-Protein Interaction Networks , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[7]  Buzhou Tang,et al.  Overlapping community detection in networks with positive and negative links , 2014 .

[8]  Charu C. Aggarwal,et al.  Node Classification in Signed Social Networks , 2016, SDM.

[9]  Jure Leskovec,et al.  Governance in Social Media: A Case Study of the Wikipedia Promotion Process , 2010, ICWSM.

[10]  P. Doreian,et al.  A partitioning approach to structural balance , 1996 .

[11]  Inderjit S. Dhillon,et al.  Non-exhaustive, Overlapping Clustering via Low-Rank Semidefinite Programming , 2015, KDD.

[12]  Jiming Liu,et al.  Community Mining from Signed Social Networks , 2007, IEEE Transactions on Knowledge and Data Engineering.

[13]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[14]  Chris Arney,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World (Easley, D. and Kleinberg, J.; 2010) [Book Review] , 2013, IEEE Technology and Society Magazine.

[15]  Naonori Ueda,et al.  Parametric Mixture Models for Multi-Labeled Text , 2002, NIPS.

[16]  Jure Leskovec,et al.  Predicting positive and negative links in online social networks , 2010, WWW '10.

[17]  Inderjit S. Dhillon,et al.  Non-exhaustive, Overlapping k-means , 2015, SDM.

[18]  Ying Ding,et al.  Community detection: Topological vs. topical , 2011, J. Informetrics.

[19]  Christian Bauckhage,et al.  The slashdot zoo: mining a social network with negative edges , 2009, WWW.

[20]  A. Vespignani,et al.  The architecture of complex weighted networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[21]  LeskovecJure,et al.  Defining and evaluating network communities based on ground-truth , 2015 .

[22]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[23]  E. David,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World , 2010 .

[24]  Jiawei Han,et al.  Mining topic-level influence in heterogeneous networks , 2010, CIKM.

[25]  Inderjit S. Dhillon,et al.  Low rank modeling of signed networks , 2012, KDD.

[26]  Pablo Jensen,et al.  Analysis of community structure in networks of correlated data. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  Inderjit S. Dhillon,et al.  Scalable clustering of signed networks using balance normalized cut , 2012, CIKM.

[28]  Jure Leskovec,et al.  Signed networks in social media , 2010, CHI.

[29]  Charu C. Aggarwal,et al.  A Survey of Signed Network Mining in Social Media , 2015, ACM Comput. Surv..

[30]  Nagarajan Natarajan,et al.  Prediction and clustering in signed networks: a local to global perspective , 2013, J. Mach. Learn. Res..

[31]  Dragomir R. Radev,et al.  Detecting Subgroups in Online Discussions by Modeling Positive and Negative Relations among Participants , 2012, EMNLP.

[32]  Jure Leskovec,et al.  Structure and Overlaps of Ground-Truth Communities in Networks , 2014, TIST.

[33]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[34]  T. Vicsek,et al.  Weighted network modules , 2007, cond-mat/0703706.

[35]  Sahin Albayrak,et al.  Spectral Analysis of Signed Graphs for Clustering, Prediction and Visualization , 2010, SDM.

[36]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[37]  Yanhui Hu,et al.  Integrating protein-protein interaction networks with phenotypes reveals signs of interactions , 2013, Nature Methods.

[38]  Grigorios Tsoumakas,et al.  MULAN: A Java Library for Multi-Label Learning , 2011, J. Mach. Learn. Res..

[39]  Jérôme Kunegis,et al.  What is the added value of negative links in online social networks? , 2013, WWW.

[40]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[41]  DaiDao-Qing,et al.  Detecting Protein Complexes from Signed Protein-Protein Interaction Networks , 2015 .

[42]  Mark Steedman,et al.  Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning , 2012 .