论文信息 - Summarizing Spoken Documents: avoiding distracting content

Summarizing Spoken Documents: avoiding distracting content

Driven by a cognitive perspective of the human summarization process, we address the problem of assessing the most relevant information of a single spoken language document, by minimizing the influence of distracting content, of which passages particularly affected by spoken language-related problems are major representatives. Two different approaches are considered. One, based only on the input source to be summarized, consists in a centrality-based relevance model for automatic summarization that uses support sets to better estimate the relevant content. Geometric proximity is used to compute semantic relatedness. Relevance is determined by considering the whole input source, and by assuming that information sources to be summarized comprehend different topics. A thorough evaluation shows statistically significant improvements over previous approaches. The other mimics the natural human behavior, in which information acquired from different sources is used to build a better understanding of a given topic. Information from different types of sources and of the same type is explored. A multi-document summarization framework provides the means to assess the relevant content. A perceptual evaluation shows that mixing information leads to considerably better results, both in terms of informativeness and readability. Concerning the use of information of the same type, results show that background information of the same topic clearly improves the detection of the most important content.

Ricardo Ribeiro | David Martins de Matos | Ricardo Ribeiro

[1] Xiaojun Wan,et al. EUSUM: extracting easy-to-understand english summaries for non-native readers , 2010, SIGIR.

[2] Brigitte Endres-Niggemeyer,et al. SimSum: an empirically founded simulation of summarizing , 2000, Inf. Process. Manag..

[3] Brigitte Endres-Niggemeyer,et al. Summarizing information , 1998 .

[4] Dragomir R. Radev,et al. LexRank: Graph-based Centrality as Salience in Text Summarization , 2004 .

[5] Ricardo Ribeiro,et al. Revisiting Centrality-as-Relevance: Support Sets and Similarity as Geometric Proximity: Extended abstract , 2013, IJCAI.

[6] Rada Mihalcea,et al. A Language Independent Algorithm for Single and Multiple Document Summarization , 2005, IJCNLP.

[7] Lucas Antiqueira,et al. A complex network approach to text summarization , 2009, Inf. Sci..

[8] Gerda Ruge,et al. Experiments on Linguistically-Based Term Associations , 1992, Inf. Process. Manag..

[9] Dilek Z. Hakkani-Tür,et al. Clusterrank: a graph based method for meeting summarization , 2009, INTERSPEECH.

[10] Brigitte Endres-niggemeyer. Human-style WWW summarization , 2000 .

[11] Kathleen R. McKeown,et al. A description of the CIDR system as used for TDT-2 , 1999 .