论文信息 - Multi-task Pairwise Neural Ranking for Hashtag Segmentation

Multi-task Pairwise Neural Ranking for Hashtag Segmentation

Hashtags are often employed on social media and beyond to add metadata to a textual utterance with the goal of increasing discoverability, aiding search, or providing additional semantics. However, the semantic content of hashtags is not straightforward to infer as these represent ad-hoc conventions which frequently include multiple words joined together and can include abbreviations and unorthodox spellings. We build a dataset of 12,594 hashtags split into individual segments and propose a set of approaches for hashtag segmentation by framing it as a pairwise ranking problem between candidate segmentations. Our novel neural approaches demonstrate 24.6% error reduction in hashtag segmentation accuracy compared to the current state-of-the-art method. Finally, we demonstrate that a deeper understanding of hashtag semantics obtained through segmentation is useful for downstream applications such as sentiment analysis, for which we achieved a 2.6% increase in average recall on the SemEval 2017 sentiment analysis dataset.

[1] Yue Zhang,et al. Context-Sensitive Lexicon Features for Neural Sentiment Analysis , 2016, EMNLP.

[2] Thierry Declerck,et al. Processing and Normalizing Hashtags , 2015, RANLP.

[3] Kenneth Heafield,et al. KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[4] Kenneth Heafield,et al. Efficient Language Modeling Algorithms with Applications to Statistical Machine Translation , 2013 .

[5] Tao Chen,et al. Context-aware Image Tweet Modelling and Recommendation , 2016, ACM Multimedia.

[6] Philipp Koehn,et al. Empirical Methods for Compound Splitting , 2003, EACL.

[7] Dale Schuurmans,et al. A Hierarchical EM Approach to Word Segmentation , 2001, NLPRS.

[8] Vasudeva Varma,et al. Towards Deep Semantic Analysis of Hashtags , 2015, ECIR.

[9] Nianwen Xue,et al. Chinese Word Segmentation as LMR Tagging , 2003, SIGHAN.

[10] Shuming Shi,et al. Microblog Hashtag Generation via Encoding Conversation Contexts , 2019, NAACL.

[11] Arzucan Özgür,et al. Segmenting Hashtags using Automatically Created Training Data , 2016, LREC.

[12] Patrick M. Haluptzok,et al. Finding the Most Probable Ranking of Objects with Probabilistic Pairwise Preferences , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14] Yoram Singer,et al. Learning to Order Things , 1997, NIPS.

[15] David Bamman,et al. Contextualized Sarcasm Detection on Twitter , 2015, ICWSM.

[16] Jian Su,et al. Attentive Gated Lexicon Reader with Contrastive Contextual Co-Attention for Sentiment Classification , 2018, EMNLP.

[17] Kuansan Wang,et al. Web scale NLP: a case study on url word breaking , 2011, WWW.

[18] Oren Etzioni,et al. Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[19] Jacob Eisenstein,et al. What to do about bad language on the internet , 2013, NAACL.

[20] Ming Zhou,et al. Building Large-Scale Twitter-Specific Sentiment Lexicon : A Representation Learning Approach , 2014, COLING.

[21] Fatiha Sadat,et al. Named Entity Recognition and Hashtag Decomposition to Improve the Classification of Tweets , 2016, NUT@COLING.

[22] Zhong Zhou,et al. Tweet2Vec: Character-Based Distributed Representations for Social Media , 2016, ACL.

[23] Giacomo Berardi,et al. ISTI@TREC Microblog Track 2011: Exploring the Use of Hashtag Segmentation and Text Quality Ranking , 2011, TREC.

[24] Arzucan Özgür,et al. Segmenting hashtags and analyzing their grammatical structure , 2018, J. Assoc. Inf. Sci. Technol..

[25] Bo Pang,et al. Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[26] Hsin-Hsi Chen,et al. Disambiguating False-Alarm Hashtag Usages in Tweets for Irony Detection , 2018, ACL.

[27] Mark Dredze,et al. Annotating Named Entities in Twitter Data with Crowdsourcing , 2010, Mturk@HLT-NAACL.

[28] Diana Maynard,et al. Who cares about Sarcastic Tweets? Investigating the Impact of Sarcasm on Sentiment Analysis. , 2014, LREC.

[29] Howard J. Hamilton,et al. Word Segmentation Algorithms with Lexical Resources for Hashtag Classification , 2016, 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[30] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[31] Jürgen Schmidhuber,et al. Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[32] Preslav Nakov,et al. SemEval-2016 Task 4: Sentiment Analysis in Twitter , 2016, *SEMEVAL.

[33] I. Good. THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[34] Jason Weston,et al. #TagSpace: Semantic Embeddings from Hashtags , 2014, EMNLP.

[35] Richard Sproat,et al. A statistical method for finding word boundaries in Chinese text , 1990 .

[36] Timothy Baldwin,et al. Lexical Normalisation of Short Text Messages: Makn Sens a #twitter , 2011, ACL.

[37] Dan Klein,et al. An Empirical Investigation of Statistical Significance in NLP , 2012, EMNLP.

[38] Muhammad Abdul-Mageed,et al. EmoNet: Fine-Grained Emotion Detection with Gated Recurrent Neural Networks , 2017, ACL.

[39] Kaiming He,et al. Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[40] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[41] Preslav Nakov,et al. Developing a successful SemEval task in sentiment analysis of Twitter and other social media texts , 2016, Language Resources and Evaluation.

[42] Dong Nguyen,et al. Emo, love and god: making sense of Urban Dictionary, a crowd-sourced online dictionary , 2017, Royal Society Open Science.

[43] Pinar Senkul,et al. Semantic Expansion of Hashtags for Enhanced Event Detection in Twitter , 2012 .

[44] Saif Mohammad,et al. NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.

[45] Mark Hopkins,et al. Tuning as Ranking , 2011, EMNLP.

[46] Huan Liu,et al. Leveraging the Implicit Structure within Social Media for Emergent Rumor Detection , 2016, CIKM.

[47] Ellen Riloff,et al. Learning Emotion Indicators from Tweets: Hashtags, Hashtag Patterns, and Phrases , 2014, EMNLP.

[48] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.