Topic evolution and social interactions: how authors effect research

We propose a method for discovering the dependency relationships between the topics of documents shared in social networks using the latent social interactions, attempting to answer the question: given a seemingly new topic, from where does this topic evolve? In particular, we seek to discover the pair-wise probabilistic dependency in topics of documents which associate social actors from a latent social network, where these documents are being shared. By viewing the evolution of topics as a Markov chain, we estimate a Markov transition matrix of topics by leveraging social interactions and topic semantics. Metastable states in a Markov chain are applied to the clustering of topics. Applied to the CiteSeer dataset, a collection of documents in academia, we show the trends of research topics, how research topics are related and which are stable. We also show how certain social actors, authors, impact these topics and propose new ways for evaluating author impact.

[1]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[2]  P. Deuflharda,et al.  Identification of almost invariant aggregates in reversible nearly uncoupled Markov chains , 2000 .

[3]  Yiming Yang,et al.  Stochastic link and group detection , 2002, AAAI/IAAI.

[4]  ChengXiang Zhai,et al.  Discovering evolutionary theme patterns from text: an exploration of temporal text mining , 2005, KDD '05.

[5]  E. Seneta Non-negative Matrices and Markov Chains , 2008 .

[6]  Valerie Isham,et al.  Non‐Negative Matrices and Markov Chains , 1983 .

[7]  Satoshi Morinaga,et al.  Tracking dynamics of topic trends using a finite mixture model , 2004, KDD.

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  T. Snijders Models for longitudinal network datain , 2005 .

[10]  Hongyuan Zha,et al.  Probabilistic models for discovering e-communities , 2006, WWW '06.

[11]  Bradley N. Miller,et al.  A hands-on introduction to collaborative filtering (tutorial session)(abstract only) , 1996, CSCW '96.

[12]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[13]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[14]  Alfred J. Lotka,et al.  The frequency distribution of scientific productivity , 1926 .

[15]  C. Lee Giles,et al.  CiteSeer: an automatic citation indexing system , 1998, DL '98.

[16]  Yiming Yang,et al.  Topic-conditioned novelty detection , 2002, KDD.

[17]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[18]  Bart Selman,et al.  Referral Web: combining social networks and collaborative filtering , 1997, CACM.

[19]  Chris H. Q. Ding,et al.  A min-max cut algorithm for graph partitioning and data clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[20]  Cheng Li,et al.  Two supervised learning approaches for name disambiguation in author citations , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[21]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[22]  Ramanathan V. Guha,et al.  Information diffusion through blogspace , 2004, WWW '04.

[23]  Tom A. B. Snijders,et al.  Methods for longitudinal social network data: Review and Markov process models , 1995 .

[24]  Thomas L. Griffiths,et al.  Probabilistic author-topic models for information discovery , 2004, KDD.

[25]  Ming Gu,et al.  Spectral min-max cut for graph partitioning and data clustering , 2001 .

[26]  Henry Kautz,et al.  Combining social networks and collaborative ?ltering , 1997 .