Encoding Conversation Context for Neural Keyphrase Extraction from Microblog Posts

Existing keyphrase extraction methods suffer from data sparsity problem when they are conducted on short and informal texts, especially microblog messages. Enriching context is one way to alleviate this problem. Considering that conversations are formed by reposting and replying messages, they provide useful clues for recognizing essential content in target posts and are therefore helpful for keyphrase identification. In this paper, we present a neural keyphrase extraction framework for microblog posts that takes their conversation context into account, where four types of neural encoders, namely, averaged embedding, RNN, attention, and memory networks, are proposed to represent the conversation context. Experimental results on Twitter and Weibo datasets show that our framework with such encoders outperforms state-of-the-art approaches.

[1]  Xuanjing Huang,et al.  Keyphrase Extraction Using Deep Recurrent Neural Networks on Twitter , 2016, EMNLP.

[2]  Xuanjing Huang,et al.  Hashtag Recommendation Using End-To-End Memory Networks with Hierarchical Attention , 2016, COLING.

[3]  Shibamouli Lahiri,et al.  Building a Dataset for Summarization and Keyword Extraction from Emails , 2014, LREC.

[4]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[5]  Yi Han,et al.  Attention-based encoder-decoder model for answer selection in question answering , 2017, Frontiers of Information Technology & Electronic Engineering.

[6]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[7]  Ian H. Witten,et al.  Thesaurus based automatic keyphrase indexing , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[8]  Yue Zhang,et al.  Context-Sensitive Twitter Sentiment Classification Using Neural Network , 2016, AAAI.

[9]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[10]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[11]  SaltonGerard,et al.  Term-weighting approaches in automatic text retrieval , 1988 .

[12]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[13]  Debanjan Ghosh,et al.  The Role of Conversation Context for Sarcasm Detection in Online Interactions , 2017, SIGDIAL Conference.

[14]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[15]  Xiaojun Wan,et al.  Single Document Keyphrase Extraction Using Neighborhood Knowledge , 2008, AAAI.

[16]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[17]  Wei Gao,et al.  Topic Extraction from Microblog Posts Using Conversation Structures , 2016, ACL.

[18]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[19]  Barak A. Pearlmutter Learning State Space Trajectories in Recurrent Neural Networks , 1989, Neural Computation.

[20]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[21]  Zhiyuan Liu,et al.  Clustering to Find Exemplar Terms for Keyphrase Extraction , 2009, EMNLP.

[22]  Stanley Peters,et al.  Identifying relevant phrases to summarize decisions in spoken meetings , 2008, INTERSPEECH.

[23]  Jaime G. Carbonell,et al.  Automatic Keyword Extraction on Twitter , 2015, ACL.

[24]  Qun Liu,et al.  HHMM-based Chinese Lexical Analyzer ICTCLAS , 2003, SIGHAN.

[25]  Yan Liu,et al.  Towards Twitter context summarization with user influence models , 2013, WSDM.

[26]  Juan-Zi Li,et al.  Loss Minimization Based Keyword Distillation , 2004, APWeb.

[27]  Wei Gao,et al.  Using Content-level Structures for Summarizing Microblog Repost Trees , 2015, EMNLP.

[28]  Min-Yen Kan,et al.  Keyphrase Extraction in Scientific Publications , 2007, ICADL.

[29]  Ole Winther,et al.  Convolutional LSTM Networks for Subcellular Localization of Proteins , 2015, AlCoB.

[30]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[31]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[32]  Han-Joon Kim,et al.  News Keyword Extraction for Topic Tracking , 2008, 2008 Fourth International Conference on Networked Computing and Advanced Information Management.

[33]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[34]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[35]  Evangelos E. Milios,et al.  A Comparative Study on Key Phrase Extraction Methods in Automatic Web Site Summarization , 2007, J. Digit. Inf. Manag..

[36]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[37]  Daniel P. W. Ellis,et al.  Feed-Forward Networks with Attention Can Solve Some Long-Term Memory Problems , 2015, ArXiv.

[38]  W. Bruce Croft,et al.  Quality models for microblog retrieval , 2012, CIKM.

[39]  Vincent Ng,et al.  Conundrums in Unsupervised Keyphrase Extraction: Making Sense of the State-of-the-Art , 2010, COLING.

[40]  Jaime G. Carbonell,et al.  Event-based summarization using a centrality-as-relevance model , 2017, Knowledge and Information Systems.

[41]  Lucy Vanderwende,et al.  Exploring Content Models for Multi-Document Summarization , 2009, NAACL.

[42]  Yang Song,et al.  Topical Keyphrase Extraction from Twitter , 2011, ACL.

[43]  Zhaohui Wu,et al.  Measuring Term Informativeness in Context , 2013, NAACL.

[44]  Kirill Kireyev,et al.  Semantic-based Estimation of Term Informativeness , 2009, NAACL.

[45]  Abdelghani Bellaachia,et al.  NE-Rank: A Novel Graph-Based Keyphrase Extraction in Twitter , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[46]  Brendan T. O'Connor,et al.  Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters , 2013, NAACL.

[47]  Yi-fang Brook Wu,et al.  Domain-specific keyphrase extraction , 2005, CIKM '05.