Community Extraction Based on Topic-Driven-Model for Clustering Users Tweets

Twitter has become a significant means by which people communicate with the world and describe their current activities, opinions and status in short text snippets. Tweets can be analyzed automatically in order to derive much potential information such as, interesting topics, social influence, user’s communities, etc. Community extraction within social networks has been a focus of recent work in several areas. Different from the most community discovery methods focused on the relations between users, we aim to derive user’s communities based on common topics from user’s tweets. For instance, if two users always talk about politic in their tweets, thus they can be grouped in the same community which is related to politic topic. To achieve this goal, we propose a new approach called CETD: Community Extraction based on Topic-Driven-Model. This approach combines our proposed model used to detect topics of the user’s tweets based on a semantic taxonomy together with a community extraction method based on the hierarchical clustering technique. Our experimentation on the proposed approach shows the relevant of the users communities extracted based on their common topics and domains.

[1]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[2]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[3]  Haifeng Du,et al.  An algorithm for detecting community structure of social networks based on prior knowledge and modularity: Research Articles , 2007 .

[4]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  A. Banerjee,et al.  Social Topic Models for Community Extraction , 2008 .

[6]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[7]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[8]  George L. Nemhauser,et al.  Min-cut clustering , 1993, Math. Program..

[9]  Haifeng Du,et al.  An algorithm for detecting community structure of social networks based on prior knowledge and modularity , 2007, Complex..

[10]  M. A. Muñoz,et al.  Journal of Statistical Mechanics: An IOP and SISSA journal Theory and Experiment Detecting network communities: a new systematic and efficient algorithm , 2004 .

[11]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Matthew Michelson,et al.  Tweet Disambiguate Entities Retrieve Folksonomy SubTree Step 1 : Discover Categories Generate Topic Profile from SubTrees Step 2 : Discover Profile Topic Profile : “ English Football ” “ World Cup ” , 2010 .

[13]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Sergio Gómez,et al.  Size reduction of complex networks preserving modularity , 2007, ArXiv.

[15]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.