论文信息 - Interpretable Likelihood for Vector Representable Topic

Interpretable Likelihood for Vector Representable Topic

Automatic topic extraction from a large number of documents is useful to capture an entire picture of the documents or to classify the documents. Here, it is an important issue to evaluate how much the extracted topics, which are set of documents, are interpretable for human. As the objective is vector representable topic extractions, e.g., Latent Semantic Analysis, we tried to formulate the interpretable likelihood of the extracted topic using the manually derived topics. We evaluated this likelihood of topics on English news articles using LSA, PCA and Spherical k-means for topic extraction. The results show that this likelihood can be applied as a filter to select meaningful topics.

Masayuki Numao | Masahiro Kimura | Ken-ichi Fukui | Kazumi Saito

[1] David G. Stork,et al. Pattern Classification (2nd ed.) , 1999 .

[2] Lakhmi C. Jain,et al. Knowledge-Based Intelligent Information and Engineering Systems , 2004, Lecture Notes in Computer Science.

[3] Peter Willett,et al. Readings in information retrieval , 1997 .

[4] J. M. Schultz,et al. Topic Detection and Tracking using idf-Weighted Cosine Coefficient , 1999 .

[5] David G. Stork,et al. Pattern Classification , 1973 .

[6] Masayuki Numao,et al. Visualizing Dynamics of the Hot Topics Using Sequence-Based Self-organizing Maps , 2005, KES.

[7] T. Landauer,et al. A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .