Multi-document summarization based on hierarchical topic model

In this paper, we introduced an extractive multi-document summarization method based on hierarchical topic model of hierarchical Latent Dirichlet Allocation (hLDA) and sentences compression. hLDA is a representative generative probabilistic model, which not only can mine latent topics from a large amount of discrete data, but also can organize these topics into a hierarchy to achieve a deeper semantic analysis. At the same time we also use sentence compression technology to refine the summaries, making them more concise. We use TAC 2010 data sets as the experimental test corpus and ROUGE method to evaluate our summaries. The evaluations confirmed that our method has better performance than some traditional methods.