Finding topic trends in digital libraries
暂无分享,去创建一个
We propose a generative model based on latent Dirichlet allocation for mining distinct topics in document collections by integrating the temporal ordering of documents into the generative process. The document collection is divided into time segments where the discovered topics in each segment is propagated to influence the topic discovery in the subsequent time segments. We conduct experiments on the collection of academic papers from CiteSeer repository. We augment the text corpus with the addition of user queries and tags and integrate the citation graph to boost the weight of the topical terms. The experiment results show that segmented topic model can effectively detect distinct topics and their evolution over time.
[1] C. Lee Giles,et al. Clustering Scientific Literature Using Sparse Citation Graph Analysis , 2006, PKDD.
[2] Thomas L. Griffiths,et al. Probabilistic author-topic models for information discovery , 2004, KDD.
[3] Chris H. Q. Ding,et al. Web document clustering using hyperlink structures , 2001, Comput. Stat. Data Anal..
[4] Ioannis Pitas,et al. Combining text and link analysis for focused crawling - An application for vertical search engines , 2007, Inf. Syst..