Char-RNN and Active Learning for Hashtag Segmentation

We explore the abilities of character recurrent neural network (char-RNN) for hashtag segmentation. Our approach to the task is the following: we generate synthetic training dataset according to frequent n-grams that satisfy predefined morpho-syntactic patterns to avoid any manual annotation. The active learning strategy limits the training dataset and selects informative training subset. The approach does not require any language-specific settings and is compared for two languages, which differ in inflection degree.

[1]  Giacomo Berardi,et al.  ISTI@TREC Microblog Track 2011: Exploring the Use of Hashtag Segmentation and Text Quality Ranking , 2011, TREC.

[2]  Jinlan Fu,et al.  Neural Networks Incorporating Dictionaries for Chinese Word Segmentation , 2018, AAAI.

[3]  Hai Zhao,et al.  Neural Word Segmentation Learning for Chinese , 2016, ACL.

[4]  Thierry Declerck,et al.  Processing and Normalizing Hashtags , 2015, RANLP.

[5]  Alexander M. Rush,et al.  LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks , 2016, IEEE Transactions on Visualization and Computer Graphics.

[6]  Laura Kallmeyer,et al.  A Neural Architecture for Dialectal Arabic Segmentation , 2017, WANLP@EACL.

[7]  Anima Anandkumar,et al.  Deep Active Learning for Named Entity Recognition , 2017, Rep4NLP@ACL.

[8]  Nianwen Xue,et al.  Chinese Word Segmentation as Character Tagging , 2003, ROCLING/IJCLCLP.

[9]  Peng Xu,et al.  PlusEmo2Vec at SemEval-2018 Task 1: Exploiting emotion knowledge from emoji and #hashtags , 2018, *SEMEVAL.

[10]  Alon Lavie,et al.  Synthesizing Compound Words for Machine Translation , 2016, ACL.

[11]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[12]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[13]  Cícero Nogueira dos Santos,et al.  Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.

[14]  Ines Rehbein,et al.  What do we need to know about an unknown word when parsing German , 2017, SWCN@EMNLP.

[15]  Iadh Ounis,et al.  Overview of the TREC 2011 Microblog Track , 2011, TREC.

[16]  Pushpak Bhattacharyya,et al.  IITP at EmoInt-2017: Measuring Intensity of Emotions using Sentence Embeddings and Optimized Features , 2017, WASSA@EMNLP.

[17]  Ye Zhang,et al.  Active Discriminative Text Representation Learning , 2016, AAAI.

[18]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[19]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[20]  Zhi-Hong Deng,et al.  A Gap-Based Framework for Chinese Word Segmentation via Very Deep Convolutional Networks , 2017, ArXiv.

[21]  Andrew McCallum,et al.  Chinese Segmentation and New Word Detection using Conditional Random Fields , 2004, COLING.

[22]  Philipp Koehn,et al.  Empirical Methods for Compound Splitting , 2003, EACL.

[23]  Vasudeva Varma,et al.  Towards Deep Semantic Analysis of Hashtags , 2015, ECIR.

[24]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[25]  Hai Zhao,et al.  Fast and Accurate Neural Word Segmentation for Chinese , 2017, ACL.

[26]  Carsten Binnig,et al.  An End-to-end Neural Natural Language Interface for Databases , 2018, ArXiv.

[27]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[28]  Christian Biemann,et al.  Unsupervised Compound Splitting With Distributional Semantics Rivals Supervised Methods , 2016, HLT-NAACL.

[29]  Joakim Nivre,et al.  Universal Word Segmentation: Implementation and Interpretation , 2018, TACL.

[30]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.