Interpretable Likelihood for Vector Representable Topic

Automatic topic extraction from a large number of documents is useful to capture an entire picture of the documents or to classify the documents. Here, it is an important issue to evaluate how much the extracted topics, which are set of documents, are interpretable for human. As the objective is vector representable topic extractions, e.g., Latent Semantic Analysis, we tried to formulate the interpretable likelihood of the extracted topic using the manually derived topics. We evaluated this likelihood of topics on English news articles using LSA, PCA and Spherical k-means for topic extraction. The results show that this likelihood can be applied as a filter to select meaningful topics.