论文信息 - A Graph-based Approach to Cross-language Multi-document Summarization

A Graph-based Approach to Cross-language Multi-document Summarization

Abstract—Cross-language summarization is the task ofgenerating a summary in a language different from the languageof the source documents. In this paper, we propose a graph-basedapproach to multi-document summarization that integratesmachine translation quality scores in the sentence extractionprocess. We evaluate our method on a manually translated subsetof the DUC 2004 evaluation campaign. Results indicate that ourapproach improves the readability of the generated summarieswithout degrading their informativity.Index Terms—Graph-based approach, cross-language multi-document summarization. I. I NTRODUCTION T HE rapid growth and online availability of informationin numerous languages have made cross-languageinformation retrieval and extraction tasks a highly relevantﬁeld of research. Cross-language document summarizationaims at providing a quick access to information expressedin one or more languages. More precisely, this task consistsin producing a summary in one language different from thelanguage of the source documents. In this study, we focuson English to French multi-document summarization. Theprimary motivation is to allow French readers to access theever increasing amount of news available through Englishnews sources.Recent years have shown an increased amount of interestin applying graph theoretic models to Natural LanguageProcessing (NLP) [1]. Graphs are natural ways to encodeinformation for NLP. Entities can be naturally represented asnodes and relations between them can be represented as edges.Graph-based representations of linguistic units as diverse aswords, sentences and documents give rise to efﬁcient solutionsin a variety of tasks ranging from part-of-speech taggingto information extraction, and sentiment analysis. Here, weapply a graph-based ranking algorithm to multi-documentsummarization.A straightforward idea for cross-language summarizationis to translate the summary from one language to the other.