论文信息 - LIF at TAC MultiLing: Towards a Truly Language Independent Summarizer

LIF at TAC MultiLing: Towards a Truly Language Independent Summarizer

This paper presents the LIF system for the TAC’2011 Multilingual pilot track. We followed a language-independent approach to summarization for this task. In particular, we tried to remove the following dependences to language: sentence segmentation, word segmentation, stop-word lists, and word-level relevance assessment. We applied these modifications to an MMR-based system and observed little degradation on English data. The submitted system had a bug that impeded all official results, therefore we propose in this paper an updated set of results with relevant analysis.

Benoît Favre | Firas Hmida

[1] Mohamed S. Kamel,et al. Automatic Extraction of Domain-Specific Stopwords from Labeled Documents , 2008, ECIR.

[2] Ani Nenkova,et al. Syntactic Simplification for Improving Content Selection in Multi-Document Summarization , 2004, COLING.

[3] Fei Song,et al. Probabilistic Document Modeling for Syntax Removal in Text Summarization , 2011, ACL.

[4] Daniel Gildea,et al. Unsupervised Tokenization for Machine Translation , 2009, EMNLP.

[5] Jen-Tzung Chien,et al. Latent Dirichlet learning for document summarization , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6] David Yarowsky,et al. Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence , 1999, EMNLP.

[7] Dragomir R. Radev,et al. Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[8] Tibor Kiss,et al. Unsupervised Multilingual Sentence Boundary Detection , 2006, CL.

[9] M Damashek,et al. Gauging Similarity with n-Grams: Language-Independent Categorization of Text , 1995, Science.

[10] Ryan T. McDonald. A Study of Global Inference Algorithms in Multi-document Summarization , 2007, ECIR.

[11] Francine Chen,et al. A trainable document summarizer , 1995, SIGIR '95.