Extractive Summarization of Broadcast News: Comparing Strategies for European Portuguese

This paper presents the comparison between three methods for extractive summarization of Portuguese broadcast news: feature-based, Maximal Marginal Relevance, and Latent Semantic Analysis. The main goal is to understand the level of agreement among the automatic summaries and how they compare to summaries produced by non-professional human summarizers. Results were evaluated using the ROUGE-L metric. Maximal Marginal Relevance performed close to human summarizers. Both feature-based and Latent Semantic Analysis automatic summarizers performed close to each other and worse than Maximal Marginal Relevance, when compared to the summaries done by the human summarizers.

[1]  João Paulo da Silva Neto,et al.  Evaluation of an alert system for selective dissemination of broadcast news , 2003, INTERSPEECH.

[2]  Miles Osborne,et al.  Using maximum entropy for sentence extraction , 2002, ACL 2002.

[3]  Sadaoki Furui RECENT ADVANCES IN AUTOMATIC SPEECH SUMMARIZATION , 2006, 2006 IEEE Spoken Language Technology Workshop.

[4]  Alex A. Freitas,et al.  Document Clustering and Text Summarization , 2000 .

[5]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[6]  Isabel Trancoso,et al.  AUTOMATIC VS. MANUAL TOPIC SEGMENTATION AND INDEXATION IN BROADCAST NEWS , 2006 .

[7]  Eric Laporte Resolução de ambiguidades , 2001 .

[8]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[9]  Julia Hirschberg,et al.  Comparing lexical, acoustic/prosodic, structural and discourse features for speech summarization , 2005, INTERSPEECH.

[10]  Chin-Yew Lin,et al.  Looking for a Few Good Metrics: ROUGE and its Evaluation , 2004 .

[11]  Mark T. Maybury,et al.  Automatic Summarization , 2002, Computational Linguistics.

[12]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[13]  Isabel Trancoso,et al.  Improving the topic indexation and segmentation modules of a media watch system , 2004, INTERSPEECH.

[14]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[15]  Isabel Trancoso,et al.  A SYSTEM FOR SELECTIVE DISSEMINATION OF MULTIMEDIA INFORMATION RESULTING FROM THE ALERT PROJECT , 2003 .

[16]  Jean Carletta,et al.  Extractive summarization of meeting recordings , 2005, INTERSPEECH.