Dual Adversarial Neural Transfer for Low-Resource Named Entity Recognition

We propose a new neural transfer method termed Dual Adversarial Transfer Network (DATNet) for addressing low-resource Named Entity Recognition (NER). Specifically, two variants of DATNet, i.e., DATNet-F and DATNet-P, are investigated to explore effective feature fusion between high and low resource. To address the noisy and imbalanced training data, we propose a novel Generalized Resource-Adversarial Discriminator (GRAD). Additionally, adversarial training is adopted to boost model generalization. In experiments, we examine the effects of different components in DATNet across domains and languages and show that significant improvement can be obtained especially for low-resource data, without augmenting any additional hand-crafted features and pre-trained language model.

[1]  Thamar Solorio,et al.  A Multi-task Approach for Named Entity Recognition in Social Media Data , 2017, NUT@EMNLP.

[2]  Sam Coope,et al.  Named Entity Recognition With Parallel Recurrent Neural Networks , 2018, ACL.

[3]  Mark Cieliebak,et al.  Transfer Learning and Sentence Level Features for Named Entity Recognition on Tweets , 2017, NUT@EMNLP.

[4]  Jungo Kasai,et al.  Robust Multilingual Part-of-Speech Tagging via Adversarial Training , 2017, NAACL.

[5]  Stephen D. Mayhew,et al.  Cheap Translation for Cross-Lingual Named Entity Recognition , 2017, EMNLP.

[6]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[7]  Steven Skiena,et al.  POLYGLOT-NER: Massive Multilingual Named Entity Recognition , 2014, SDM.

[8]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[9]  Graham Neubig,et al.  Cross-Lingual Word Embeddings for Low-Resource Language Modeling , 2017, EACL.

[10]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[11]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[12]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[13]  Heng Ji,et al.  Name Tagging for Low-resource Incident Languages based on Expectation-driven Learning , 2016, HLT-NAACL.

[14]  Andrew McCallum,et al.  Lexicon Infused Phrase Embeddings for Named Entity Resolution , 2014, CoNLL.

[15]  Quoc V. Le,et al.  Multi-task Sequence to Sequence Learning , 2015, ICLR.

[16]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[17]  Ruslan Salakhutdinov,et al.  Multi-Task Cross-Lingual Sequence Tagging from Scratch , 2016, ArXiv.

[18]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[19]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[20]  Yoshimasa Tsuruoka,et al.  A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks , 2016, EMNLP.

[21]  Jun Zhao,et al.  Adversarial Transfer Learning for Chinese Named Entity Recognition with Self-Attention Mechanism , 2018, EMNLP.

[22]  Anders Søgaard,et al.  Zero-Shot Sequence Labeling: Transferring Knowledge from Sentences to Tokens , 2018, NAACL.

[23]  Ioannis Partalas,et al.  Learning to Search for Recognizing Named Entities in Twitter , 2016, NUT@COLING.

[24]  Heng Ji,et al.  A Multi-lingual Multi-task Architecture for Low-resource Sequence Labeling , 2018, ACL.

[25]  Nigel Collier,et al.  Bidirectional LSTM for Named Entity Recognition in Twitter Messages , 2016, NUT@COLING.

[26]  Mónica Marrero,et al.  Named Entity Recognition: Fallacies, challenges and opportunities , 2013, Comput. Stand. Interfaces.

[27]  Jinfeng Yi,et al.  EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples , 2017, AAAI.

[28]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[29]  Iryna Gurevych,et al.  Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging , 2017, EMNLP.

[30]  Zaiqing Nie,et al.  Joint Entity Recognition and Disambiguation , 2015, EMNLP.

[31]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[32]  Young-Bum Kim,et al.  Cross-Lingual Transfer Learning for POS Tagging without Cross-Lingual Resources , 2017, EMNLP.

[33]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[34]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[36]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[37]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[38]  Oriol Vinyals,et al.  Multilingual Language Processing From Bytes , 2015, NAACL.

[39]  Jian Ni,et al.  Weakly Supervised Cross-Lingual Named Entity Recognition via Effective Annotation and Representation Projection , 2017, ACL.

[40]  Xuanjing Huang,et al.  Adversarial Multi-task Learning for Text Classification , 2017, ACL.

[41]  Zhiguo Cao,et al.  RoSeq: Robust Sequence Labeling , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[43]  Roland Vollgraf,et al.  Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[44]  Nizar Habash,et al.  CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies , 2017, CoNLL.

[45]  Heng Ji,et al.  Cross-lingual Name Tagging and Linking for 282 Languages , 2017, ACL.

[46]  Xiaocheng Feng,et al.  Improving Low Resource Named Entity Recognition using Cross-lingual Knowledge Transfer , 2018, IJCAI.

[47]  Xiang Ren,et al.  Empower Sequence Labeling with Task-Aware Neural Language Model , 2017, AAAI.

[48]  Marek Rei,et al.  Semi-supervised Multitask Learning for Sequence Labeling , 2017, ACL.

[49]  Ruslan Salakhutdinov,et al.  Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks , 2016, ICLR.

[50]  Heng Ji,et al.  Joint bilingual name tagging for parallel corpora , 2012, CIKM '12.

[51]  Kenny Q. Zhu,et al.  Multi-channel BiLSTM-CRF Model for Emerging Named Entity Recognition in Social Media , 2017, NUT@EMNLP.

[52]  Xuanjing Huang,et al.  Part-of-Speech Tagging for Twitter with Adversarial Neural Networks , 2017, EMNLP.

[53]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.

[54]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[55]  Chengqing Zong,et al.  On Jointly Recognizing and Aligning Bilingual Named Entities , 2010, ACL.

[56]  Trevor Cohn,et al.  Model Transfer for Tagging Low-resource Languages using a Bilingual Dictionary , 2017, ACL.

[57]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[58]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[59]  Andrew M. Dai,et al.  Adversarial Training Methods for Semi-Supervised Text Classification , 2016, ICLR.

[60]  Ryan Cotterell,et al.  Low-Resource Named Entity Recognition with Cross-lingual, Character-Level Neural Conditional Random Fields , 2017, IJCNLP.

[61]  Jian Ni,et al.  Improving Multilingual Named Entity Recognition with Wikipedia Entity Type Mapping , 2016, EMNLP.

[62]  David Yarowsky,et al.  Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora , 2001, HLT.

[63]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.