Probabilistic models for discovering e-communities

The increasing amount of communication between individuals in e-formats (e.g. email, Instant messaging and the Web) has motivated computational research in social network analysis (SNA). Previous work in SNA has emphasized the social network (SN) topology measured by communication frequencies while ignoring the semantic information in SNs. In this paper, we propose two generative Bayesian models for semantic community discovery in SNs, combining probabilistic modeling with community detection in SNs. To simulate the generative models, an EnF-Gibbs sampling algorithm is proposed to address the efficiency and performance problems of traditional methods. Experimental studies on Enron email corpus show that our approach successfully detects the communities of individuals and in addition provides semantic topic descriptions of these communities.

[1]  G. W. Snedecor Statistical Methods , 1964 .

[2]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[3]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[5]  Hongyuan Zha,et al.  A new Mallows distance based metric for comparing clusterings , 2005, ICML '05.

[6]  Padhraic Smyth,et al.  Algorithms for estimating relative importance in networks , 2003, KDD '03.

[7]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[8]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[10]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[11]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[13]  Matthew Richardson,et al.  Mining knowledge-sharing sites for viral marketing , 2002, KDD.

[14]  Yang Song,et al.  Towards discovering organizational structure from email corpus , 2005, Fourth International Conference on Machine Learning and Applications (ICMLA'05).

[15]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  Bernardo A. Huberman,et al.  Email as spectroscopy: automated discovery of community structure within organizations , 2003 .

[17]  Matthai Philipose,et al.  Mining models of human activities from the web , 2004, WWW '04.

[18]  Jiawei Han,et al.  Mining scale-free networks using geodesic clustering , 2004, KDD.

[19]  John Scott Social Network Analysis , 1988 .

[20]  Filippo Menczer,et al.  Algorithmic detection of semantic similarity , 2005, WWW '05.

[21]  Mark Newman,et al.  Detecting community structure in networks , 2004 .

[22]  Xavier Llorà,et al.  Mining directed social network from message board , 2005, WWW '05.

[23]  Christian P. Robert,et al.  Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .

[24]  Thomas L. Griffiths,et al.  Probabilistic author-topic models for information discovery , 2004, KDD.

[25]  David E. Goldberg,et al.  Mining Social Networks in Message Boards , 2005 .

[26]  Jafar Adibi,et al.  The Enron Email Dataset Database Schema and Brief Statistical Report , 2004 .

[27]  Andrew McCallum,et al.  The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email , 2005 .

[28]  Andrew McCallum,et al.  Extracting social networks and contact information from email and the Web , 2004, CEAS.

[29]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[30]  Bernardo A. Huberman,et al.  E-Mail as Spectroscopy: Automated Discovery of Community Structure within Organizations , 2005, Inf. Soc..