Description-Driven Community Detection

Traditional approaches to community detection, as studied by physicists, sociologists, and more recently computer scientists, aim at simply partitioning the social network graph. However, with the advent of online social networking sites, richer data has become available: beyond the link information, each user in the network is annotated with additional information, for example, demographics, shopping behavior, or interests. In this context, it is therefore important to develop mining methods which can take advantage of all available information. In the case of community detection, this means finding good communities (a set of nodes cohesive in the social graph) which are associated with good descriptions in terms of user information (node attributes). Having good descriptions associated to our models make them understandable by domain experts and thus more useful in real-world applications. Another requirement dictated by real-world applications, is to develop methods that can use, when available, any domain-specific background knowledge. In the case of community detection the background knowledge could be a vague description of the communities sought in a specific application, or some prototypical nodes (e.g., good customers in the past), that represent what the analyst is looking for (a community of similar users). Towards this goal, in this article, we define and study the problem of finding a diverse set of cohesive communities with concise descriptions. We propose an effective algorithm that alternates between two phases: a hill-climbing phase producing (possibly overlapping) communities, and a description induction phase which uses techniques from supervised pattern set mining. Our framework has the nice feature of being able to build well-described cohesive communities starting from any given description or seed set of nodes, which makes it very flexible and easily applicable in real-world applications. Our experimental evaluation confirms that the proposed method discovers cohesive communities with concise descriptions in realistic and large online social networks such as Delicious, Flickr, and LastFM.

[1]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  Charalampos E. Tsourakakis,et al.  Chromatic Correlation Clustering , 2015, TKDD.

[3]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  Boleslaw K. Szymanski,et al.  Towards Linear Time Overlapping Community Detection in Social Networks , 2012, PAKDD.

[5]  Wei Chen,et al.  A game-theoretic framework to identify overlapping communities in social networks , 2010, Data Mining and Knowledge Discovery.

[6]  Mohammed J. Zaki,et al.  Structural correlation pattern mining for large graphs , 2010, MLG '10.

[7]  Kun-Lung Wu,et al.  Towards proximity pattern mining in large graphs , 2010, SIGMOD Conference.

[8]  Steve Gregory,et al.  Finding overlapping communities in networks by label propagation , 2009, ArXiv.

[9]  Hai Yang,et al.  ACM Transactions on Intelligent Systems and Technology - Special Section on Urban Computing , 2014 .

[10]  Malik Magdon-Ismail,et al.  Finding communities by clustering a graph into overlapping subgraphs , 2005, IADIS AC.

[11]  T. Vicsek,et al.  Directed network modules , 2007, physics/0703248.

[12]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[13]  Huan Liu,et al.  Uncoverning Groups via Heterogeneous Interaction Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[14]  Santo Fortunato,et al.  Finding Statistically Significant Communities in Networks , 2010, PloS one.

[15]  Wouter Duivesteijn,et al.  Exceptional Model Mining , 2008, Data Mining and Knowledge Discovery.

[16]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[17]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[18]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[19]  Roded Sharan,et al.  Cluster graph modification problems , 2002, Discret. Appl. Math..

[20]  Martin Atzmüller,et al.  Efficient Descriptive Community Mining , 2011, FLAIRS.

[21]  Andrea Lancichinetti,et al.  Detecting the overlapping and hierarchical community structure in complex networks , 2008, 0802.1218.

[22]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[23]  Silvio Lattanzi,et al.  Affiliation networks , 2009, STOC '09.

[24]  Victor Muntés-Mulero,et al.  Overlapping Community Search for social networks , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[25]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[26]  Matthijs van Leeuwen,et al.  Maximal exceptions with minimal descriptions , 2010, Data Mining and Knowledge Discovery.

[27]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[28]  Witold Pedrycz,et al.  Particle Competition and Cooperation for Uncovering Network Overlap Community Structure , 2011, ISNN.

[29]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  R. Lambiotte,et al.  Line graphs, link partitions, and overlapping communities. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[31]  Youngdo Kim,et al.  Map equation for link communities. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  Christopher M. Bishop,et al.  Classification and regression , 1997 .

[33]  Lise Getoor,et al.  Proceedings of the Eighth Workshop on Mining and Learning with Graphs, MLG '10, Washington, D.C., USA, July 24-25, 2010 , 2010, MLG@KDD.

[34]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Nicola Barbieri,et al.  Cascade-based community detection , 2013, WSDM.

[36]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[37]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[38]  Xiaoming Liu,et al.  SLPA: Uncovering Overlapping Communities in Social Networks via a Speaker-Listener Interaction Dynamic Process , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[39]  Avrim Blum,et al.  Correlation Clustering , 2004, Machine Learning.

[40]  Mark E. J. Newman,et al.  An efficient and principled method for detecting communities in networks , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[41]  J. Kumpula,et al.  Sequential algorithm for fast clique percolation. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[42]  Francesco Bonchi,et al.  Compressing tags to find interesting media groups , 2009, CIKM.

[43]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[44]  Boleslaw K. Szymanski,et al.  Overlapping community detection in networks: The state-of-the-art and comparative study , 2011, CSUR.

[45]  Huan Liu,et al.  Discovering Overlapping Groups in Social Media , 2010, 2010 IEEE International Conference on Data Mining.

[46]  Albrecht Zimmermann,et al.  Fast, Effective Molecular Feature Mining by Local Optimization , 2010, ECML/PKDD.

[47]  Martin Ester,et al.  Mining Cohesive Patterns from Graphs with Feature Vectors , 2009, SDM.