Semantically meaningful group detection within sub-communities of Twitter blogosphere: A topic oriented multi-objective clustering approach

This paper addresses the problem of semantically meaningful group detection within a sub-community of twitter micro-bloggers by utilizing a topic modeling, multi-objective clustering approach. The proposed group detection method is anchored on the Latent Dirichlet Allocation (LDA) topic modeling technique, aiming at identifying clusters of twitter users that are optimal in terms of both spatial and topical compactness. Specifically, the group detection problem is formulated as a multi-objective optimization problem taking into consideration two complementary cluster formation directives. The first objective, related to spatial compactness, is achieved by minimizing the overall deviation from the corresponding cluster centers. The second, related to topical compactness, is achieved by minimizing the portion of probability mass assigned to low probability topics for the corresponding cluster centroids. In our approach, optimization is performed by employing a multi-objective genetic algorithm ,which results in a variety of cluster structures that are significantly more interpretable than cluster assignments obtained with traditional single-objective clustering algorithms.

[1]  Rajesh Krishnan,et al.  Efficient clustering algorithms for self-organizing wireless sensor networks , 2006, Ad Hoc Networks.

[2]  Ravi Kumar,et al.  Structure and evolution of online social networks , 2006, KDD '06.

[3]  Peter R. Monge,et al.  Theories of Communication Networks , 2003 .

[4]  Lada A. Adamic,et al.  Information flow in social groups , 2003, cond-mat/0305305.

[5]  Jianyong Wang,et al.  Out-of-core coherent closed quasi-clique mining from large dense graph databases , 2007, TODS.

[6]  Sandra Sudarsky,et al.  Massive Quasi-Clique Detection , 2002, LATIN.

[7]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[8]  M. Dupelj [On the theories of communication]. , 1966, Neuropsihijatrija.

[9]  Gregory Buehrer,et al.  A scalable pattern mining approach to web graph compression with communities , 2008, WSDM '08.

[10]  Masaru Kitsuregawa,et al.  A Graph Based Approach to Extract a Neighborhood Customer Community for Collaborative Filtering , 2002, DNIS.

[11]  Ravi Kumar,et al.  Discovering Large Dense Subgraphs in Massive Graphs , 2005, VLDB.

[12]  Christos Faloutsos,et al.  Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining , 2013, ASONAM 2013.

[13]  Morad Benyoucef,et al.  Knowledge sharing in dynamic virtual enterprises: A socio-technological perspective , 2011, Knowl. Based Syst..

[14]  Andrew McCallum,et al.  Topic and Role Discovery in Social Networks with Experiments on Enron and Academic Email , 2007, J. Artif. Intell. Res..

[15]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[16]  Danielle S. McNamara,et al.  Handbook of latent semantic analysis , 2007 .

[17]  John Scott Social Network Analysis , 1988 .

[18]  Sung Jin Hur,et al.  Improved trust-aware recommender system using small-worldness of trust networks , 2010, Knowl. Based Syst..

[19]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Yiannis Kompatsiaris,et al.  Community detection in Social Media , 2012, Data Mining and Knowledge Discovery.

[21]  Thomas L. Griffiths,et al.  Probabilistic Topic Models , 2007 .

[22]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[23]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.