Multi-Document Automatic Summarization Based on the Hierarchical Topics

A concept of is proposed for multi-document automatic summarization task, which used multi-layer topic tree structure to represent the text set. Each node in the topic tree represent specific topic and contains multiple similar sentences in the text set. The structure may describe accurately the similarity between sentences at different levels of granularity. Therefore it can reflect the real content of the text set than single layer topic set. And can be used to find the important sentences in the important topic which can compose the summary of the text set. Concretely, a series of algorithms including building tree, key sentences extraction based on tree and summarization generation are proposed. The capability of summarization system is testified by sets of experiments and shows good result.