Topic evolution based on LDA and HMM and its application in stem cell research

This paper analyses topic segmentation based on the LDA (Latent Dirichlet Allocation) model, and performs the topic segmentation and topic evolution of stem cell research literatures in PubMed from 2001 to 2012 by combining the HMM (Hidden Markov Model) and co-occurrence theory. Stem cell research topics were obtained with LDA and expert judgements made on these topics to test the feasibility of the model classification. Further, the correlation between topics was analysed. HMM was used to predict the trend evolution of topics over various years, and a time series map was used to visualize the evolutional relationships among the stem cell topics.

[1]  M. Callon,et al.  Mapping the dynamics of science and technology : sociology of science in the real world , 1988 .

[2]  Henry G. Small,et al.  Paradigms, citations, and maps of science: A personal history , 2003, J. Assoc. Inf. Sci. Technol..

[3]  Chong Wang,et al.  Continuous Time Dynamic Topic Models , 2008, UAI.

[4]  Hua Xu,et al.  Constrained LDA for Grouping Product Features in Opinion Mining , 2011, PAKDD.

[5]  E GARFIELD,et al.  Citation indexes for science; a new dimension in documentation through association of ideas. , 2006, Science.

[6]  Dmitriy Fradkin,et al.  Anticipating annotations and emerging trends in biomedical literature , 2008, KDD.

[7]  Jianhua Hou,et al.  The structure and dynamics of cocitation clusters: A multiple-perspective cocitation analysis , 2010, J. Assoc. Inf. Sci. Technol..

[8]  Chong Wang,et al.  Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process , 2009, NIPS.

[9]  M. Battarbee Figure 6 , 2019 .

[10]  Henk F. Moed,et al.  Mapping of science by combined co-citation and word analysis: II: Dynamical aspects , 1991, J. Am. Soc. Inf. Sci..

[11]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[12]  ChengXiang Zhai,et al.  Discovering evolutionary theme patterns from text: an exploration of temporal text mining , 2005, KDD '05.

[13]  Shi Jing Text Segmentation Based on Model LDA , 2008 .

[14]  Maosong Sun,et al.  Tag-LDA for Scalable Real-time Tag Recommendation , 2009 .

[15]  Eugene Garfield,et al.  From the science of science to Scientometrics visualizing the history of science with HistCite software , 2009, J. Informetrics.

[16]  ChengXiang Zhai,et al.  A mixture model for contextual text mining , 2006, KDD '06.

[17]  Henk F. Moed,et al.  Mapping of science by combined co-citation and word analysis, I. Structural aspects , 1991, J. Am. Soc. Inf. Sci..

[18]  Xu Ling,et al.  Topic sentiment mixture: modeling facets and opinions in weblogs , 2007, WWW '07.

[19]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[20]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[21]  John Yen,et al.  An LDA-based Community Structure Discovery Approach for Large-Scale Social Networks , 2007, 2007 IEEE Intelligence and Security Informatics.

[22]  Jean Pierre Courtial,et al.  Policy and the mapping of scientific change: A co-word analysis of research into environmental acidification , 1988, Scientometrics.

[23]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[24]  Loet Leydesdorff,et al.  Why Words and Co-Words Cannot Map the Development of the Sciences , 1997, J. Am. Soc. Inf. Sci..

[25]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[26]  Xiang Ji,et al.  Topic evolution and social interactions: how authors effect research , 2006, CIKM '06.

[27]  Ming Hu,et al.  Text Segmentation Based on Model LDA: Text Segmentation Based on Model LDA , 2009 .

[28]  A. McCallum,et al.  Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[29]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[30]  Vladan Devedzic,et al.  Model driven engineering of a tableau algorithm for description logics , 2009, Comput. Sci. Inf. Syst..

[31]  Qingqiang Wu,et al.  Co-word analysis of the trends in stem cells field based on subject heading weighting , 2011, Scientometrics.

[32]  Daniel Barbará,et al.  On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking , 2008, 2008 Eighth IEEE International Conference on Data Mining.