The Author-Topic-Community model for author interest profiling and community discovery

In this paper, we propose a generative model named the author-topic-community (ATC) model for representing a corpus of linked documents. The ATC model allows each author to be associated with a topic distribution and a community distribution as its model parameters. A learning algorithm based on variational inference is derived for the model parameter estimation where the two distributions are essentially reinforcing each other during the estimation. We compare the performance of the ATC model with two related generative models using first synthetic data sets and then real data sets, which include a research community data set, a blog data set, a news-sharing data set, and a microblogging data set. The empirical results obtained confirm that the proposed ATC model outperforms the existing models for tasks such as author interest profiling and author community discovery. We also demonstrate how the inferred ATC model can be used to characterize the roles of users/authors in online communities.

[1]  David M. Blei,et al.  Relational Topic Models for Document Networks , 2009, AISTATS.

[2]  Wai Lam,et al.  Probabilistic joint models incorporating logic and learning via structured variational approximation for information extraction , 2011, Knowledge and Information Systems.

[3]  Johan Bollen,et al.  Adding community and dynamic to topic models , 2012, J. Informetrics.

[4]  Akrivi Katifori,et al.  Creating an Ontology for the User Profile: Method and Applications , 2007, RCIS.

[5]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[6]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[8]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[9]  Padhraic Smyth,et al.  A Spectral Clustering Approach To Finding Communities in Graph , 2005, SDM.

[10]  John Yen,et al.  An adaptive algorithm for learning changes in user interests , 1999, CIKM '99.

[11]  Hongyuan Zha,et al.  Probabilistic models for discovering e-communities , 2006, WWW '06.

[12]  François Fouss,et al.  Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[13]  A. Hoffman,et al.  Lower bounds for the partitioning of graphs , 1973 .

[14]  Erik Cambria,et al.  Commonsense-based topic modeling , 2013, WISDOM '13.

[15]  Enrico Motta,et al.  Integrating Folksonomies with the Semantic Web , 2007, ESWC.

[16]  Gang Liu,et al.  Short text similarity based on probabilistic topics , 2009, Knowledge and Information Systems.

[17]  M. Barber Modularity and community detection in bipartite networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[19]  Yan Liu,et al.  Topic-link LDA: joint models of topic and author community , 2009, ICML '09.

[20]  Yunming Ye,et al.  The Author-Topic-Community Model: A Generative Model Relating Authors' Interests and Their Community Structure , 2012, ADMA.

[21]  Cornelia Caragea,et al.  Context Sensitive Topic Models for Author Influence in Document Networks , 2011, IJCAI.

[22]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[24]  Erik Cambria,et al.  Big Social Data Analysis , 2013 .

[25]  Esteban Moro Egido,et al.  The dynamical strength of social ties in information spreading , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  Lise Getoor,et al.  A Latent Dirichlet Model for Unsupervised Entity Resolution , 2005, SDM.

[27]  Ruixuan Li,et al.  MEI: Mutual Enhanced Infinite Community-Topic Model for Analyzing Text-Augmented Social Networks , 2013, Comput. J..

[28]  Huidong Jin,et al.  Sequential latent Dirichlet allocation , 2012, Knowledge and Information Systems.

[29]  Ramesh Nallapati,et al.  Joint latent topic models for text and citations , 2008, KDD.

[30]  Dan Roth,et al.  Citation Author Topic Model in Expert Search , 2010, COLING.

[31]  Takenao Ohkawa,et al.  Entity Network Prediction Using Multitype Topic Models , 2008, IEICE Trans. Inf. Syst..

[32]  Yun Chi,et al.  Analyzing communities and their evolutions in dynamic social networks , 2009, TKDD.

[33]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[34]  Aleks Jakulin,et al.  Applying Discrete PCA in Data Analysis , 2004, UAI.

[35]  Qiang Wang,et al.  Topic oriented community detection through social objects and link analysis in social networks , 2012, Knowl. Based Syst..

[36]  Xiaohui Yan,et al.  A biterm topic model for short texts , 2013, WWW.

[37]  Deng Cai,et al.  Topic modeling with network regularization , 2008, WWW.

[38]  Jie Tang,et al.  A Combination Approach to Web User Profiling , 2010, TKDD.

[39]  Andrea E. F. Clementi,et al.  Information Spreading in Stationary Markovian Evolving Graphs , 2009, IEEE Transactions on Parallel and Distributed Systems.

[40]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[41]  Raymond Y. K. Lau,et al.  A Probabilistic Generative Model for Mining Cybercriminal Networks from Online Social Media , 2014, IEEE Computational Intelligence Magazine.

[42]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..