Named Entity Recognition for Social Media Texts with Semantic Augmentation

Existing approaches for named entity recognition suffer from data sparsity problems when conducted on short and informal texts, especially user-generated social media content. Semantic augmentation is a potential way to alleviate this problem. Given that rich semantic information is implicitly preserved in pre-trained word embeddings, they are potential ideal resources for semantic augmentation. In this paper, we propose a neural-based approach to NER for social media texts where both local (from running text) and augmented semantics are taken into account. In particular, we obtain the augmented semantic information from a large-scale corpus, and propose an attentive semantic augmentation module and a gate module to encode and aggregate such information, respectively. Extensive experiments are performed on three benchmark datasets collected from English and Chinese social media platforms, where the results demonstrate the superiority of our approach to previous studies across all three datasets.

[1]  Yonggang Wang,et al.  Joint Chinese Word Segmentation and Part-of-speech Tagging via Two-way Attentions of Auto-analyzed Knowledge , 2020, ACL.

[2]  Nanyun Peng,et al.  Named Entity Recognition for Chinese Social Media with Jointly Trained Embeddings , 2015, EMNLP.

[3]  Jin-Hyuk Hong,et al.  Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information , 2018, AAAI.

[4]  Rick Siow Mong Goh,et al.  Dual Adversarial Neural Transfer for Low-Resource Named Entity Recognition , 2019, ACL.

[5]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[6]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[9]  Roland Vollgraf,et al.  Pooled Contextualized Embeddings for Named Entity Recognition , 2019, NAACL.

[10]  Hadrien Glaude,et al.  A Closer Look At Feature Space Data Augmentation For Few-Shot Intent Classification , 2019, EMNLP.

[11]  Yue Zhang,et al.  Chinese NER Using Lattice LSTM , 2018, ACL.

[12]  Thamar Solorio,et al.  A Multi-task Approach for Named Entity Recognition in Social Media Data , 2017, NUT@EMNLP.

[13]  Tong Zhang,et al.  ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations , 2019, FINDINGS.

[14]  Alexandros Potamianos,et al.  Attention-based Conditioning Methods for External Knowledge Integration , 2019, ACL.

[15]  Jing Li,et al.  Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings , 2018, NAACL.

[16]  Grigori Sidorov,et al.  Data Augmentation using Machine Translation for Fake News Detection in the Urdu Language , 2020, LREC.

[17]  Tong Zhang,et al.  Improving Constituency Parsing with Span Attention , 2020, FINDINGS.

[18]  Zita Marinho,et al.  Joint Learning of Named Entity Recognition and Entity Linking , 2019, ACL.

[19]  Yan Song,et al.  Learning Word Representations with Regularization from Prior Knowledge , 2017, CoNLL.

[20]  Guoxin Wang,et al.  CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition , 2019, NAACL.

[21]  Fei Xia,et al.  Supertagging Combinatory Categorial Grammar with Attentive Graph Convolutional Networks , 2020, EMNLP.

[22]  Yonggang Wang,et al.  Improving Chinese Word Segmentation with Wordhood Memory Networks , 2020, ACL.

[23]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[24]  Mark Steedman,et al.  Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.

[25]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[26]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[27]  Roland Vollgraf,et al.  Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[28]  Wei Wu,et al.  Glyce: Glyph-vectors for Chinese Character Representations , 2019, NeurIPS.

[29]  Shengping Liu,et al.  Leverage Lexical Knowledge for Chinese Named Entity Recognition via Collaborative Graph Network , 2019, EMNLP.

[30]  Xipeng Qiu,et al.  TENER: Adapting Transformer Encoder for Named Entity Recognition , 2019, ArXiv.

[31]  Xuanjing Huang,et al.  CNN-Based Chinese NER with Lexicon Rethinking , 2019, IJCAI.

[32]  Alan Ritter,et al.  Results of the WNUT16 Named Entity Recognition Shared Task , 2016, NUT@COLING.

[33]  Jun Xu,et al.  HAS-QA: Hierarchical Answer Spans Model for Open-domain Question Answering , 2019, AAAI.

[34]  Wei Lu,et al.  Dependency-Guided LSTM-CRF for Named Entity Recognition , 2019, EMNLP.

[35]  Daisuke Kawahara,et al.  Knowledge-Enriched Two-Layered Attention Network for Sentiment Analysis , 2018, NAACL-HLT.

[36]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[37]  Jing Li,et al.  Joint Learning Embeddings for Chinese Words and their Components via Ladder Structured Networks , 2018, IJCAI.

[38]  Camille Pradel,et al.  Mining Discourse Markers for Unsupervised Sentence Representation Learning , 2019, NAACL.

[39]  Chenliang Li,et al.  Exploiting Multiple Embeddings for Chinese Named Entity Recognition , 2019, CIKM.

[40]  Yan Song,et al.  A Common Case of Jekyll and Hyde: The Synergistic Effect of Using Divided Source Training Data for Feature Augmentation , 2013, IJCNLP.

[41]  Jungo Kasai,et al.  Syntax-aware Neural Semantic Role Labeling with Supertags , 2019, NAACL.

[42]  Leon Derczynski,et al.  Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition , 2017, NUT@EMNLP.