A subtopic-enriched MMR approach to sentence ranking for Chinese multi-document summarization

In this paper, we present SEMMR, a novel subtopic-enriched sentence ranking method for Chinese multi-document summarization derived from Maximal Marginal Relevance (MMR). MMR is one of the most popular ranking algorithms for balancing the topical relevance and content redundancy in a unified framework, which has been well employed in the context of text retrieval and document summarization. For multi-document summarization task, existing MMR-based approaches usually directly incorporate the topical relevance between each sentence and the main topic into the sentence ranking process while ignoring the latent subtopic information of finer granularity. Actually, a document set on a main topic usually consists of a few implicit subtopics, and different subtopic may have unequal impact on the sentence ranking. Specifically, the sentences having higher proximity with the subtopics close to the main topic are deemed more relevant than the sentences related with the subtopics far away from the main topic. To address this issue and take into account the subtopic's impact on sentence ranking performance, this paper extends the traditional MMR algorithm by integrating the sub-topical relevance as well as the sentence-to-subtopic proximity into the unified ranking process. Preliminary experimental results indicate the effectiveness of our proposed methods.