Self reinforcement for important passage retrieval

In general, centrality-based retrieval models treat all elements of the retrieval space equally, which may reduce their effectiveness. In the specific context of extractive summarization (or important passage retrieval), this means that these models do not take into account that information sources often contain lateral issues, which are hardly as important as the description of the main topic, or are composed by mixtures of topics. We present a new two-stage method that starts by extracting a collection of key phrases that will be used to help centrality-as-relevance retrieval model. We explore several approaches to the integration of the key phrases in the centrality model. The proposed method is evaluated using different datasets that vary in noise (noisy vs clean) and language (Portuguese vs English). Results show that the best variant achieves relative performance improvements of about 31% in clean data and 18% in noisy data.

[1]  Alexander H. Waibel,et al.  Minimizing Word Error Rate in Textual Summaries of Spoken Language , 2000, ANLP.

[2]  Ian H. Witten,et al.  Subject metadata support powered by Maui , 2010, JCDL '10.

[3]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[4]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[5]  Dragomir R. Radev,et al.  LexRank: Graph-based Centrality as Salience in Text Summarization , 2004 .

[6]  Gerald Penn,et al.  A Critical Reassessment of Evaluation Baselines for Speech Summarization , 2008, ACL.

[7]  João Paulo da Silva Neto,et al.  Keyphrase Cloud Generation of Broadcast News , 2013, INTERSPEECH.

[8]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[9]  Mark Last,et al.  Graph-Based Keyword Extraction for Single-Document Summarization , 2008, COLING 2008.

[10]  Oren Kurland,et al.  PageRank without hyperlinks: structural re-ranking using links induced by language models , 2005, SIGIR '05.

[11]  Elena Lloret,et al.  Quantifying the Limits and Success of Extractive Summarization Systems Across Domains , 2010, HLT-NAACL.

[12]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[13]  Hongyuan Zha,et al.  Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering , 2002, SIGIR '02.

[14]  Vincent Ng,et al.  Conundrums in Unsupervised Keyphrase Extraction: Making Sense of the State-of-the-Art , 2010, COLING.

[15]  Xiaojun Wan,et al.  Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction , 2007, ACL.

[16]  Furu Wei,et al.  Query-sensitive mutual reinforcement chain and its application in query-oriented multi-document summarization , 2008, SIGIR '08.

[17]  Julia Hirschberg,et al.  Comparing lexical, acoustic/prosodic, structural and discourse features for speech summarization , 2005, INTERSPEECH.

[18]  Pascale Fung,et al.  Extractive Speech Summarization Using Shallow Rhetorical Structure Modeling , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Mitsuru Ishizuka,et al.  Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[20]  Massimo Franceschet,et al.  PageRank , 2010, Commun. ACM.

[21]  Thorsten Joachims,et al.  Temporal corpus summarization using submodular word coverage , 2012, CIKM '12.

[22]  Jaime G. Carbonell,et al.  Hourly Traffic Prediction of News Stories , 2013, ArXiv.

[23]  Ani Nenkova,et al.  Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion , 2007, Information Processing & Management.

[24]  Thierry Poibeau,et al.  Multi-source, Multilingual Information Extraction and Summarization , 2012, Theory and Applications of Natural Language Processing.

[25]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[26]  K. Spärck Jones,et al.  Between shallow and deep: an experiment in automatic summarising , 2005 .

[27]  Ricardo Ribeiro,et al.  Improving Speech-to-Text Summarization by Using Additional Information Sources , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[28]  Dilek Z. Hakkani-Tür,et al.  Long story short - Global unsupervised models for keyphrase based meeting summarization , 2010, Speech Commun..

[29]  Ricardo Ribeiro,et al.  Revisiting Centrality-as-Relevance: Support Sets and Similarity as Geometric Proximity: Extended abstract , 2013, IJCAI.

[30]  Xiaojun Wan,et al.  EUSUM: extracting easy-to-understand english summaries for non-native readers , 2010, SIGIR.

[31]  Maria das Graças Volpe Nunes,et al.  A comprehensive comparative evaluation of RST-based summarization methods , 2010, TSLP.

[32]  James Caverlee,et al.  PageRank for ranking authors in co-citation networks , 2009, J. Assoc. Inf. Sci. Technol..

[33]  Jaime G. Carbonell,et al.  Supervised Topical Key Phrase Extraction of News Stories using Crowdsourcing, Light Filtering and Co-reference Normalization , 2012, LREC.

[34]  Horacio Saggion,et al.  The CONCISUS Corpus of Event Summaries , 2012, LREC.

[35]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.