Reinforced Iterative Knowledge Distillation for Cross-Lingual Named Entity Recognition

Named entity recognition (NER) is a fundamental component in many applications, such as Web Search and Voice Assistants. Although deep neural networks greatly improve the performance of NER, due to the requirement of large amounts of training data, deep neural networks can hardly scale out to many languages in an industry setting. To tackle this challenge, cross-lingual NER transfers knowledge from a rich-resource language to languages with low resources through pre-trained multilingual language models. Instead of using training data in target languages, cross-lingual NER has to rely on only training data in source languages, and optionally adds the translated training data derived from source languages. However, the existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages, which is relatively easy to collect in industry applications. To address the opportunities and challenges, in this paper we describe our novel practice in Microsoft to leverage such large amounts of unlabeled data in target languages in real production settings. To effectively extract weak supervision signals from the unlabeled data, we develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning. The empirical study on three benchmark data sets verifies that our approach establishes the new state-of-the-art performance with clear edges. Now, the NER techniques reported in this paper are on their way to become a fundamental component for Web ranking, Entity Pane, Answers Triggering, and Question Answering in the Microsoft Bing search engine. Moreover, our techniques will also serve as part of the Spoken Language Understanding module for a commercial voice assistant. We plan to open source the code of the prototype framework after deployment.

[1]  Shafiq R. Joty,et al.  Zero-Resource Cross-Lingual Named Entity Recognition , 2019, AAAI.

[2]  Kunfeng Lai,et al.  A User-Centered Concept Mining System for Query and Document Understanding at Tencent , 2019, KDD.

[3]  T. Dick,et al.  Foreword , 2010, Respiratory Physiology & Neurobiology.

[4]  Yiming Yang,et al.  Cross-lingual Distillation for Text Classification , 2017, ACL.

[5]  Roland Vollgraf,et al.  Pooled Contextualized Embeddings for Named Entity Recognition , 2019, NAACL.

[6]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[7]  Ming Gong,et al.  Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System , 2019, WSDM.

[8]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[9]  Stephen D. Mayhew,et al.  Cheap Translation for Cross-Lingual Named Entity Recognition , 2017, EMNLP.

[10]  Pascale Fung,et al.  Do We Need Word Order Information for Cross-lingual Sequence Labeling , 2020, ArXiv.

[11]  Quoc V. Le,et al.  Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[13]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14]  Nan Duan,et al.  Enhancing Answer Boundary Detection for Multilingual Machine Reading Comprehension , 2020, ACL.

[15]  Kevin Duh,et al.  A Multi-task Learning Approach to Adapting Bilingual Word Embeddings for Cross-lingual Named Entity Recognition , 2017, IJCNLP.

[16]  Haoran Li,et al.  MTOP: A Comprehensive Multilingual Task-Oriented Semantic Parsing Benchmark , 2020, EACL.

[17]  Stephen D. Mayhew,et al.  Cross-Lingual Named Entity Recognition via Wikification , 2016, CoNLL.

[18]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[19]  Ming Gong,et al.  Reinforced Multi-Teacher Selection for Knowledge Distillation , 2020, AAAI.

[20]  Ruslan Salakhutdinov,et al.  Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks , 2016, ICLR.

[21]  Guoxin Wang,et al.  Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources , 2019, AAAI.

[22]  Biqing Huang,et al.  Single-/Multi-Source Cross-Lingual NER via Teacher-Student Learning on Unlabeled Data in Target Language , 2020, ACL.

[23]  Jian Ni,et al.  Weakly Supervised Cross-Lingual Named Entity Recognition via Effective Annotation and Representation Projection , 2017, ACL.

[24]  Hannaneh Hajishirzi,et al.  Multi-modal Information Extraction from Text, Semi-structured, and Tabular Data on the Web , 2020, KDD.

[25]  Jian Ni,et al.  Towards Lingua Franca Named Entity Recognition with BERT , 2019, ArXiv.

[26]  Jaime G. Carbonell,et al.  Neural Cross-Lingual Named Entity Recognition with Minimal Resources , 2018, EMNLP.

[27]  Trevor Cohn,et al.  Massively Multilingual Transfer for NER , 2019, ACL.

[28]  Jian Pei,et al.  Mining Implicit Relevance Feedback from User Behavior for Web Question Answering , 2020, KDD.

[29]  Mark Dredze,et al.  Do Explicit Alignments Robustly Improve Massively Multilingual Encoders? , 2020, EMNLP.

[30]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[31]  Ming Zhou,et al.  Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks , 2019, EMNLP.

[32]  Jie Tang,et al.  Self-Supervised Learning: Generative or Contrastive , 2020, IEEE Transactions on Knowledge and Data Engineering.

[33]  Mark Dredze,et al.  Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT , 2019, EMNLP.

[34]  Heng Ji,et al.  Cross-lingual Name Tagging and Linking for 282 Languages , 2017, ACL.

[35]  Lijun Wu,et al.  A Study of Reinforcement Learning for Neural Machine Translation , 2018, EMNLP.

[36]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[37]  Oscar Täckström Nudging the Envelope of Direct Transfer Methods for Multilingual Named Entity Recognition , 2012, HLT-NAACL 2012.

[38]  Frank Hutter,et al.  Fixing Weight Decay Regularization in Adam , 2017, ArXiv.

[39]  Eneko Agirre,et al.  Translation Artifacts in Cross-lingual Transfer Learning , 2020, EMNLP.

[40]  Ana Valeria González-Garduño Reinforcement Learning for Improved Low Resource Dialogue Generation , 2019, AAAI.

[41]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.

[42]  Yu Gong,et al.  A Minimax Game for Instance based Selective Transfer Learning , 2019, KDD.

[43]  Iryna Gurevych,et al.  MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer , 2020, EMNLP.

[44]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[45]  Chao Zhang,et al.  BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant Supervision , 2020, KDD.

[46]  Gokhan Tur,et al.  Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .