Abstractive Summarization of Spoken and Written Conversations Based on Phrasal Queries

We propose a novel abstractive querybased summarization system for conversations, where queries are defined as phrases reflecting a user information needs. We rank and extract the utterances in a conversation based on the overall content and the phrasal query information. We cluster the selected sentences based on their lexical similarity and aggregate the sentences in each cluster by means of a word graph model. We propose a ranking strategy to select the best path in the constructed graph as a query-based abstract sentence for each cluster. A resulting summary consists of abstractive sentences representing the phrasal query information and the overall content of the conversation. Automatic and manual evaluation results over meeting, chat and email conversations show that our approach significantly outperforms baselines and previous extractive models.

[1]  Ido Dagan,et al.  PROBABILISTIC TEXTUAL ENTAILMENT: GENERIC APPLIED MODELING OF LANGUAGE VARIABILITY , 2004 .

[2]  Dragomir R. Radev,et al.  Biased LexRank: Passage retrieval using random walks with question-based priors , 2009, Inf. Process. Manag..

[3]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[4]  Julio Gonzalo,et al.  Indexing with WordNet synsets can improve text retrieval , 1998, WordNet@ACL/COLING.

[5]  Ani Nenkova,et al.  Measuring Importance and Query Relevance in Topic-focused Multi-document Summarization , 2007, ACL.

[6]  Katja Filippova,et al.  Multi-Sentence Compression: Finding Shortest Paths in Word Graphs , 2010, COLING.

[7]  G. Carenini,et al.  A Publicly Available Annotated Corpus for Supervised Email Summarization , 2008 .

[8]  Alexander S. Yeh,et al.  More accurate tests for the statistical significance of result differences , 2000, COLING.

[9]  Shafiq R. Joty,et al.  Supervised Topic Segmentation of Email Conversations , 2011, ICWSM.

[10]  Regina Barzilay,et al.  Sentence Fusion for Multidocument News Summarization , 2005, CL.

[11]  Claire Cardie,et al.  A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization , 2013, ACL.

[12]  Lisa Hunemark Query expansion using search logs and WordNet , 2010 .

[13]  Eduard H. Hovy,et al.  The Automated Acquisition of Topic Signatures for Text Summarization , 2000, COLING.

[14]  Shafiq R. Joty,et al.  Towards Topic Labeling with Phrase Entailment and Aggregation , 2013, NAACL.

[15]  Clement T. Yu,et al.  Advanced Metasearch Engine Technology , 2010, Advanced Metasearch Engine Technology.

[16]  Dilek Z. Hakkani-Tür,et al.  A global optimization framework for meeting summarization , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Giuseppe Carenini,et al.  Generating and Validating Abstracts of Meeting Conversations: a User Study , 2010, INLG.

[18]  Gilad Ravid,et al.  Information overload and the message dynamics of online interaction spaces: a theoretical model and empirical exploration , 2004, IEEE Engineering Management Review.

[19]  Giuseppe Carenini,et al.  Abstractive Meeting Summarization with Entailment and Fusion , 2013, ENLG.

[20]  Fei Liu,et al.  From Extractive to Abstractive Meeting Summaries: Can It Be Done by Sentence Compression? , 2009, ACL.

[21]  David W. Aha,et al.  The Ubuntu Chat Corpus for Multiparticipant Chat Analysis , 2013, AAAI Spring Symposium: Analyzing Microtext.

[22]  Ido Dagan,et al.  Entailment-based Text Exploration with Application to the Health-care Domain , 2012, ACL.

[23]  Karen Spärck Jones,et al.  Generic summaries for indexing in information retrieval , 2001, SIGIR '01.

[24]  Jiawei Han,et al.  Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions , 2010, COLING.

[25]  Shafiq R. Joty,et al.  Topic Segmentation and Labeling in Asynchronous Conversations , 2013, J. Artif. Intell. Res..

[26]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[27]  David W. Aha,et al.  Plans Toward Automated Chat Summarization , 2011 .

[28]  Liang Zhou,et al.  Digesting Virtual "Geek" Culture: The Summarization of Technical Internet Relay Chats , 2005, ACL.

[29]  Claire Cardie,et al.  Domain-Independent Abstract Generation for Focused Meeting Summarization , 2013, ACL.

[30]  Ido Dagan,et al.  Global Learning of Typed Entailment Rules , 2011, ACL.

[31]  Jean Carletta,et al.  The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.