Determining the topic of a document is necessary to understand the content of the document efficiently. Latent Dirichlet Allocation (LDA) is a method of analyzing topics. In LDA, a topic is treated as an unobservable variable to establish a probabilistic distribution of words. We can interpret the topic with a list of words that appear with high probability in the topic. This method works well when determining a topic included in many documents having a variety of contents. However, it is difficult to interpret the topic just using conventional LDA when determining the topic in a set of article abstracts found by a keyword search, because their contents are limited and similar. We propose a method to estimate representative words of each topic from an LDA result. Experimental results show that our method provides better information for interpreting a topic than LDA does. Keywords-LDA; topic analysis; Gibbs sampling.
[1]
Dan Roth,et al.
Citation Author Topic Model in Expert Search
,
2010,
COLING.
[2]
Michael I. Jordan,et al.
Latent Dirichlet Allocation
,
2001,
J. Mach. Learn. Res..
[3]
Mark Steyvers,et al.
Finding scientific topics
,
2004,
Proceedings of the National Academy of Sciences of the United States of America.
[4]
David Buttler,et al.
Latent topic feedback for information retrieval
,
2011,
KDD.
[5]
John D. Lafferty,et al.
Visualizing Topics with Multi-Word Expressions
,
2009,
0907.1013.