Toward Fully Exploiting Heterogeneous Corpus:A Decoupled Named Entity Recognition Model with Two-stage Training

Named Entity Recognition (NER) is a fundamental and widely used task in natural language processing (NLP), which is generally trained on the human-annotated corpus. However, data annotation is costly and timeconsuming, which restricts its scale and further leads to the performance bottleneck of NER models. In reality, we can conveniently collect large-scale entity dictionaries and distantly supervised data. However, the collected dictionaries are lack of semantic context and the distantly supervised training instances contain large noise, which will bring uncertain effects to NER models when directly incorporated into the high-quality training set. To address the above issue, we propose a BERT-based decoupled NER model with two-stage training to appropriately take advantage of the heterogeneous corpus, including dictionaries, distantly supervised instances, and human-annotated instances. Our decoupled model consists of a Mention-BERT and a Context-BERT to respectively learn from the context-deficient dictionaries and noised distantly supervised instances at the pre-training stage. At the unifiedtraining stage, the two BERTs are trained together on human-annotated data to predict the correct labels for candidate regions. Empirical studies on three Chinese NER datasets demonstrate that our method achieves significant improvements against several baselines, establishing the new state-of-the-art performance.

[1]  Teng Ren,et al.  Learning Named Entity Tagger using Domain-Specific Dictionary , 2018, EMNLP.

[2]  Yu Sun,et al.  ERNIE: Enhanced Representation through Knowledge Integration , 2019, ArXiv.

[3]  Pierre Lison,et al.  Named Entity Recognition without Labelled Data: A Weak Supervision Approach , 2020, ACL.

[4]  Minlong Peng,et al.  Simplify the Usage of Lexicon in Chinese NER , 2019, ACL.

[5]  Shengping Liu,et al.  Leverage Lexical Knowledge for Chinese Named Entity Recognition via Collaborative Graph Network , 2019, EMNLP.

[6]  Makoto Miwa,et al.  Deep Exhaustive Model for Nested Named Entity Recognition , 2018, EMNLP.

[7]  Chen Jia,et al.  Entity Enhanced BERT Pre-training for Chinese NER , 2020, EMNLP.

[8]  Nanyun Peng,et al.  Named Entity Recognition for Chinese Social Media with Jointly Trained Embeddings , 2015, EMNLP.

[9]  Xuchen Yao,et al.  Information Extraction over Structured Data: Question Answering with Freebase , 2014, ACL.

[10]  Wei Lu,et al.  Better Modeling of Incomplete Annotations for Named Entity Recognition , 2019, NAACL.

[11]  He Jiang,et al.  Learning to Contextually Aggregate Multi-Source Supervision for Sequence Labeling , 2020, ACL.

[12]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[13]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[14]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[15]  Jiwei Li,et al.  A Unified MRC Framework for Named Entity Recognition , 2019, ACL.

[16]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[17]  Chin-Yew Lin,et al.  Towards Improving Neural Named Entity Recognition with Gazetteers , 2019, ACL.

[18]  Xu Sun,et al.  F-Score Driven Max Margin Neural Network for Named Entity Recognition in Chinese Social Media , 2016, EACL.

[19]  Xipeng Qiu,et al.  FLAT: Chinese NER Using Flat-Lattice Transformer , 2020, ACL.

[20]  Qun Liu,et al.  TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.

[21]  Tao Gui,et al.  A Lexicon-Based Graph Neural Network for Chinese NER , 2019, EMNLP.

[22]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[23]  Yue Zhang,et al.  Chinese NER Using Lattice LSTM , 2018, ACL.

[24]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[25]  Jun Zhao,et al.  Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks , 2015, ACL.

[26]  Luo Si,et al.  A Neural Multi-digraph Model for Chinese NER with Gazetteers , 2019, ACL.

[27]  Min Zhang,et al.  Distantly Supervised NER with Partial Annotation Learning and Reinforcement Learning , 2018, COLING.

[28]  John Thickstun,et al.  CONDITIONAL RANDOM FIELDS , 2016 .

[29]  Christopher D. Manning,et al.  Combining Distant and Partial Supervision for Relation Extraction , 2014, EMNLP.

[30]  Philippe Langlais,et al.  Robust Lexical Features for Improved Neural Network Named-Entity Recognition , 2018, COLING.

[31]  Shujian Huang,et al.  Fine-grained Knowledge Fusion for Sequence Labeling Domain Adaptation , 2019, EMNLP.

[32]  Amit P. Sheth,et al.  Location Name Extraction from Targeted Text Streams using Gazetteer-based Statistical Language Models , 2017, COLING.

[33]  Wanxiang Che,et al.  N-LTP: A Open-source Neural Chinese Language Technology Platform with Pretrained Models , 2020, ArXiv.

[34]  Doug Downey,et al.  Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.

[35]  Xuanjing Huang,et al.  CNN-Based Chinese NER with Lexicon Rethinking , 2019, IJCAI.

[36]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[37]  Juntao Yu,et al.  Named Entity Recognition as Dependency Parsing , 2020, ACL.

[38]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[39]  Xuanjing Huang,et al.  Distantly Supervised Named Entity Recognition using Positive-Unlabeled Learning , 2019, ACL.

[40]  Michal Konkol,et al.  Named Entity Recognition , 2012 .

[41]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[42]  Gina-Anne Levow,et al.  The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition , 2006, SIGHAN@COLING/ACL.

[43]  Yueran Zu,et al.  An Encoding Strategy Based Word-Character LSTM for Chinese NER , 2019, NAACL.

[44]  Naveen Arivazhagan,et al.  Small and Practical BERT Models for Sequence Labeling , 2019, EMNLP.

[45]  Waleed Ammar,et al.  Combining Distant and Direct Supervision for Neural Relation Extraction , 2019, NAACL-HLT.