Identifying semantically meaningful sub-communities within Twitter blogosphere

This paper addresses the problem of semantically meaningful group detection within a sub-community of twitter micro-bloggers by utilizing a topic modeling, multi-objective clustering approach. The proposed group detection method is anchored on the Latent Dirichlet Allocation (LDA) topic modeling technique, aiming at identifying clusters of twitter users that are optimal in terms of both spatial and topical compactness. Specifically, the group detection problem is formulated as a multi-objective optimization problem taking into consideration two complementary cluster formation directives. The first objective, related to spatial compactness, is achieved by minimizing the overall deviation from the corresponding cluster centers. The second, related to topical compactness, is achieved by minimizing the portion of probability mass assigned to low probability topics for the corresponding cluster centroids. In our approach, optimization is performed by employing a multi-objective genetic algorithm, which results in a variety of cluster structures that are significantly more interpretable than cluster assignments obtained with traditional single-objective clustering algorithms.

[1]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[2]  Srinivasan Parthasarathy,et al.  Scalable graph clustering using stochastic flows: applications to community discovery , 2009, KDD.

[3]  Tom Minka,et al.  Expectation-Propogation for the Generative Aspect Model , 2002, UAI.

[4]  Yaochu Jin,et al.  Multi-Objective Machine Learning , 2006, Studies in Computational Intelligence.

[5]  Peter R. Monge,et al.  Theories of Communication Networks , 2003 .

[6]  Sandra Sudarsky,et al.  Massive Quasi-Clique Detection , 2002, LATIN.

[7]  Emanuel Falkenauer,et al.  Genetic Algorithms and Grouping Problems , 1998 .

[8]  Masaru Kitsuregawa,et al.  A Graph Based Approach to Extract a Neighborhood Customer Community for Collaborative Filtering , 2002, DNIS.

[9]  Tomoyuki Hiroyasu,et al.  Multiobjective clustering with automatic k-determination for large-scale data , 2007, GECCO '07.

[10]  Rajesh Krishnan,et al.  Efficient clustering algorithms for self-organizing wireless sensor networks , 2006, Ad Hoc Networks.

[11]  Ravi Kumar,et al.  Structure and evolution of online social networks , 2006, KDD '06.

[12]  Dinh Q. Phung,et al.  Flickr hypergroups , 2009, ACM Multimedia.

[13]  Sung Jin Hur,et al.  Improved trust-aware recommender system using small-worldness of trust networks , 2010, Knowl. Based Syst..

[14]  Yun Chi,et al.  Combining link and content for community detection: a discriminative approach , 2009, KDD.

[15]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[16]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[17]  Lada A. Adamic,et al.  Information flow in social groups , 2003, cond-mat/0305305.

[18]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[20]  John Scott Social Network Analysis , 1988 .

[21]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[22]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[23]  Filippo Menczer,et al.  Evolutionary model selection in unsupervised learning , 2002, Intell. Data Anal..

[24]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[25]  Thomas L. Griffiths,et al.  A probabilistic approach to semantic representation , 2019, Proceedings of the Twenty-Fourth Annual Conference of the Cognitive Science Society.

[26]  Thomas L. Griffiths,et al.  Prediction and Semantic Association , 2002, NIPS.

[27]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[28]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[29]  Jianyong Wang,et al.  Out-of-core coherent closed quasi-clique mining from large dense graph databases , 2007, TODS.

[30]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[32]  Yiannis Kompatsiaris,et al.  Community detection in Social Media , 2012, Data Mining and Knowledge Discovery.

[33]  Ravi Kumar,et al.  Discovering Large Dense Subgraphs in Massive Graphs , 2005, VLDB.

[34]  Gregory Buehrer,et al.  A scalable pattern mining approach to web graph compression with communities , 2008, WSDM '08.

[35]  Morad Benyoucef,et al.  Knowledge sharing in dynamic virtual enterprises: A socio-technological perspective , 2011, Knowl. Based Syst..

[36]  Flávio Bortolozzi,et al.  Unsupervised feature selection using multi-objective genetic algorithms for handwritten word recognition , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[37]  Joshua D. Knowles,et al.  Exploiting the Trade-off - The Benefits of Multiple Objectives in Data Clustering , 2005, EMO.

[38]  Andrew McCallum,et al.  Topic and Role Discovery in Social Networks with Experiments on Enron and Academic Email , 2007, J. Artif. Intell. Res..