Using Bilingual Information for Cross-Language Document Summarization

Cross-language document summarization is defined as the task of producing a summary in a target language (e.g. Chinese) for a set of documents in a source language (e.g. English). Existing methods for addressing this task make use of either the information from the original documents in the source language or the information from the translated documents in the target language. In this study, we propose to use the bilingual information from both the source and translated documents for this task. Two summarization methods (SimFusion and CoRank) are proposed to leverage the bilingual information in the graph-based ranking framework for cross-language summary extraction. Experimental results on the DUC2001 dataset with manually translated reference Chinese summaries show the effectiveness of the proposed methods.

[1]  Mirella Lapata,et al.  Proceedings of ACL-08: HLT , 2008 .

[2]  Ahmet Aker,et al.  Multi-Document Summarization Using A* Search and Discriminative Learning , 2010, EMNLP.

[3]  Kathleen McKeown,et al.  Improving Multilingual Summarization: Using Redundancy in the Input to Correct MT errors , 2005, HLT.

[4]  Jong-Hyeok Lee,et al.  Multi-Document Summarization Using Cross-Language Texts , 2004, NTCIR.

[5]  Xiaojun Wan,et al.  Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction , 2007, ACL.

[6]  Rada Mihalcea,et al.  A Language Independent Algorithm for Single and Multiple Document Summarization , 2005, IJCNLP.

[7]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[8]  Constantin Orasan,et al.  Evaluation of a Cross-lingual Romanian-English Multi-document Summariser , 2008, LREC.

[9]  Ani Nenkova,et al.  The Pyramid Method: Incorporating human content selection variation in summarization evaluation , 2007, TSLP.

[10]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[11]  Dragomir R. Radev,et al.  LexPageRank: Prestige in Multi-Document Text Summarization , 2004, EMNLP.

[12]  Mark Last,et al.  A New Approach to Improving Multilingual Summarization Using a Genetic Algorithm , 2010, ACL.

[13]  Hongyuan Zha,et al.  Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering , 2002, SIGIR '02.

[14]  Ani Nenkova,et al.  Can you summarize this? Identifying correlates of input difficulty for generic multi-document summarization , 2008, ACL 2008.

[15]  Xiaojun Wan,et al.  Cross-Language Document Summarization Based on Machine Translation Quality Prediction , 2010, ACL.

[16]  Dan Klein,et al.  Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[17]  Xiaojun Wan,et al.  Using Cross-Document Random Walks for Topic-Focused Multi-Document , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[18]  Anton Leuski,et al.  Cross-lingual C*ST*RD: English access to Hindi information , 2003, TALIP.

[19]  Kam-Fai Wong,et al.  Extractive Summarization Using Supervised and Semi-Supervised Learning , 2008, COLING.

[20]  Massih-Reza Amini,et al.  The use of unlabeled data to improve supervised learning for text summarization , 2002, SIGIR '02.

[21]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[22]  Eduard H. Hovy,et al.  The Automated Acquisition of Topic Signatures for Text Summarization , 2000, COLING.

[23]  Xiaojun Wan,et al.  Multi-document summarization using cluster-based link analysis , 2008, SIGIR '08.

[24]  Ahmet Aker,et al.  Multi-document summarization using A * search and discriminative training , 2013 .

[25]  Dilek Z. Hakkani-Tür,et al.  A Hybrid Hierarchical Model for Multi-Document Summarization , 2010, ACL.

[26]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[27]  Prasad Pingali,et al.  Experiments in Cross Language Query Focused Multi-Document Summarization , 2006 .

[28]  Ani Nenkova,et al.  Automatic Evaluation of Linguistic Quality in Multi-Document Summarization , 2010, ACL.

[29]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.