Detecting topic evolution in scientific literature: how can citations help?

Understanding how topics in scientific literature evolve is an interesting and important problem. Previous work simply models each paper as a bag of words and also considers the impact of authors. However, the impact of one document on another as captured by citations, one important inherent element in scientific literature, has not been considered. In this paper, we address the problem of understanding topic evolution by leveraging citations, and develop citation-aware approaches. We propose an iterative topic evolution learning framework by adapting the Latent Dirichlet Allocation model to the citation network and develop a novel inheritance topic model. We evaluate the effectiveness and efficiency of our approaches and compare with the state of the art approaches on a large collection of more than 650,000 research papers in the last 16 years and the citation network enabled by CiteSeerX. The results clearly show that citations can help to understand topic evolution better.

[1]  Andrew McCallum,et al.  Topic and Role Discovery in Social Networks with Experiments on Enron and Academic Email , 2007, J. Artif. Intell. Res..

[2]  Myra Spiliopoulou,et al.  MONIC: modeling and monitoring cluster transitions , 2006, KDD '06.

[3]  Chao Liu,et al.  A probabilistic approach to spatiotemporal theme pattern mining on weblogs , 2006, WWW '06.

[4]  Chong Wang,et al.  Continuous Time Dynamic Topic Models , 2008, UAI.

[5]  Steffen Bickel,et al.  Unsupervised prediction of citation influences , 2007, ICML '07.

[6]  Gideon S. Mann,et al.  Bibliometric impact measures leveraging topic analysis , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[7]  Thomas L. Griffiths,et al.  Probabilistic author-topic models for information discovery , 2004, KDD.

[8]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[9]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Gregor Heinrich Parameter estimation for text analysis , 2009 .

[11]  Myra Spiliopoulou,et al.  Discovering Emerging Topics in Unlabelled Text Collections , 2006, ADBIS.

[12]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[13]  James Allan,et al.  Topic Detection and Tracking , 2002, The Information Retrieval Series.

[14]  SaltonGerard,et al.  Term-weighting approaches in automatic text retrieval , 1988 .

[15]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[16]  C. Lee Giles,et al.  Topic and Trend Detection in Text Collections Using Latent Dirichlet Allocation , 2009, ECIR.

[17]  Deng Cai,et al.  Topic modeling with network regularization , 2008, WWW.

[18]  Padhraic Smyth,et al.  Statistical entity-topic models , 2006, KDD '06.

[19]  ChengXiang Zhai,et al.  Discovering evolutionary theme patterns from text: an exploration of temporal text mining , 2005, KDD '05.

[20]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[21]  Satoshi Morinaga,et al.  Tracking dynamics of topic trends using a finite mixture model , 2004, KDD.

[22]  Dmitriy Fradkin,et al.  Anticipating annotations and emerging trends in biomedical literature , 2008, KDD.

[23]  Carl Lagoze,et al.  Detecting research topics via the correlation between graphs and texts , 2007, KDD '07.

[24]  Myra Spiliopoulou,et al.  Topic Evolution in a Stream of Documents , 2009, SDM.

[25]  Daniel Barbará,et al.  On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[26]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[27]  Ramesh Nallapati,et al.  Joint latent topic models for text and citations , 2008, KDD.

[28]  C. Lee Giles,et al.  Finding topic trends in digital libraries , 2009, JCDL '09.

[29]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[30]  Thorsten Brants,et al.  Story Link Detection and New Event Detection are Asymmetric , 2003, HLT-NAACL.

[31]  Xiang Ji,et al.  Topic evolution and social interactions: how authors effect research , 2006, CIKM '06.

[32]  Yiming Yang,et al.  Link Detection – Results and Analysis , 1999 .

[33]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[34]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.