Towards Topic Labeling with Phrase Entailment and Aggregation

We propose a novel framework for topic labeling that assigns the most representative phrases for a given set of sentences covering the same topic. We build an entailment graph over phrases that are extracted from the sentences, and use the entailment relations to identify and select the most relevant phrases. We then aggregate those selected phrases by means of phrase generalization and merging. We motivate our approach by applying over conversational data, and show that our framework improves performance significantly over baseline algorithms.

[1]  Martin Porter,et al.  Snowball: A language for stemming algorithms , 2001 .

[2]  Marcello Federico,et al.  Towards Cross-Lingual Textual Entailment , 2010, NAACL.

[3]  Shafiq R. Joty,et al.  Supervised Topic Segmentation of Email Conversations , 2011, ICWSM.

[4]  Ian H. Witten,et al.  Topic indexing with Wikipedia , 2008 .

[5]  Marcello Federico,et al.  Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment , 2011, ACL.

[6]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[7]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[8]  Alessandro Moschitti,et al.  Syntactic/Semantic Structures for Textual Entailment Recognition , 2010, NAACL.

[9]  Regina Barzilay,et al.  Generating a Table-of-Contents , 2007, ACL.

[10]  Ido Dagan,et al.  Entailment-based Text Exploration with Application to the Health-care Domain , 2012, ACL.

[11]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[12]  Shafiq R. Joty,et al.  Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails , 2010, EMNLP.

[13]  ChengXiang Zhai,et al.  Automatic labeling of multinomial topic models , 2007, KDD '07.

[14]  Marcello Federico,et al.  Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents , 2012, ACL.

[15]  Giuseppe Carenini,et al.  Methods for mining and summarizing text conversations , 2011, SIGIR '12.

[16]  Sanda M. Harabagiu,et al.  Using topic themes for multi-document summarization , 2010, TOIS.

[17]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[18]  Matteo Negri,et al.  Divide and Conquer: Crowdsourcing the Creation of Cross-Lingual Textual Entailment Corpora , 2011, EMNLP.

[19]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[20]  Chris Mellish,et al.  Capturing the Interaction between Aggregation and Text Planning in Two Generation Systems , 2000, INLG.

[21]  Mirella Lapata,et al.  Aggregation via Set Partitioning for Natural Language Generation , 2006, NAACL.

[22]  Timothy Baldwin,et al.  SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientific Articles , 2010, *SEMEVAL.

[23]  Peter Clark,et al.  The Seventh PASCAL Recognizing Textual Entailment Challenge , 2011, TAC.

[24]  Patrick Pantel,et al.  VerbOcean: Mining the Web for Fine-Grained Semantic Verb Relations , 2004, EMNLP.

[25]  Yi-fang Brook Wu,et al.  Domain-specific keyphrase extraction , 2005, CIKM '05.

[26]  Alon Lavie,et al.  METEOR-NEXT and the METEOR Paraphrase Tables: Improved Evaluation Support for Five Target Languages , 2010, WMT@ACL.

[27]  Ted Pedersen,et al.  Maximizing Semantic Relatedness to Perform Word Sense Disambiguation , 2005 .

[28]  Tilman Becker,et al.  Combining Multiple Information Layers for the Automatic Generation of Indicative Meeting Abstracts , 2007, ENLG.

[29]  Mark Sammons,et al.  Recognizing Textual Entailment , 2015 .

[30]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[31]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[32]  Timothy Baldwin,et al.  Automatic Labelling of Topic Models , 2011, ACL.

[33]  Shimei Pan,et al.  TIARA: Interactive, Topic-Based Visual Text Summarization and Analysis , 2012, TIST.

[34]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[35]  Graeme Hirst,et al.  Recognizing Textual Entailment , 2012 .

[36]  José Gabriel Pereira Lopes,et al.  Topic Segmentation Algorithms for Text Summarization and Passage Retrieval: An Exhaustive Evaluation , 2007, AAAI.

[37]  G. Carenini,et al.  A Publicly Available Annotated Corpus for Supervised Email Summarization , 2008 .

[38]  Alexander S. Yeh,et al.  More accurate tests for the statistical significance of result differences , 2000, COLING.

[39]  Timothy Baldwin,et al.  Evaluating N-gram based Evaluation Metrics for Automatic Keyphrase Extraction , 2010, COLING.

[40]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[41]  Ido Dagan,et al.  PROBABILISTIC TEXTUAL ENTAILMENT: GENERIC APPLIED MODELING OF LANGUAGE VARIABILITY , 2004 .

[42]  Ido Dagan,et al.  Global Learning of Typed Entailment Rules , 2011, ACL.

[43]  Ido Dagan,et al.  The Sixth PASCAL Recognizing Textual Entailment Challenge , 2009, TAC.

[44]  Fei Liu,et al.  Identifying the gist of conversational text: automatic keyword extraction and summarization , 2011 .