Named Entity Recognition via Noise Aware Training Mechanism with Data Filter

Named entity recognition (NER) is a fundamental task in natural language processing, these is a long held belief that datasets benefit the model. However, not all the data help with generalization, and some samples may contain ambiguous entities or noisy labels. The existing methods can not distinguish hard samples from noisy samples well, and becomes particularly challenging in the case of overfitting. This paper proposes a new method called Noise-Aware-with-Filter (NAF) to solve the issues from two sides. From the perspective of the data, we design a Logit-MaximumDifference (LMD) mechanism, which maximizes the diversity between different samples to help the model identify noisy samples. From the perspective of the model, we design an Incomplete-Trust (In-trust) loss function, which boosts LCRF with a robust DistrustCross-Entropy(DCE) term. Our proposed Intrust can effectively alleviate the overfitting caused by previous loss function. Experiments on six real-world Chinese and English NER datasets show that NAF outperforms the previous methods, and which obtained the state-ofthe-art(SOTA) results on the CoNLL2003 and CoNLL++ datasets.

[1]  Fei Wu,et al.  Dice Loss for Data-imbalanced NLP Tasks , 2019, ACL.

[2]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[3]  Sheng Liu,et al.  Early-Learning Regularization Prevents Memorization of Noisy Labels , 2020, NeurIPS.

[4]  Kiyoharu Aizawa,et al.  Joint Optimization Framework for Learning with Noisy Labels , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Zihan Wang,et al.  CrossWeigh: Training Named Entity Tagger from Imperfect Annotations , 2019, EMNLP.

[6]  Elyor Kodirov,et al.  IMAE for Noise-Robust Learning: Mean Absolute Error Does Not Treat Examples Equally and Gradient Magnitude's Variance Matters , 2019 .

[7]  Dumitru Erhan,et al.  Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[8]  Hwee Tou Ng,et al.  Towards Robust Linguistic Analysis using OntoNotes , 2013, CoNLL.

[9]  Leon Derczynski,et al.  Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition , 2017, NUT@EMNLP.

[10]  Stan Matwin,et al.  Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity , 2006, Canadian AI.

[11]  Kilian Q. Weinberger,et al.  Identifying Mislabeled Data using the Area Under the Margin Ranking , 2020, NeurIPS.

[12]  Jae-Gil Lee,et al.  Learning from Noisy Labels with Deep Neural Networks: A Survey , 2020, ArXiv.

[13]  Chao Zhang,et al.  BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant Supervision , 2020, KDD.

[14]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[15]  Michael Flor,et al.  A Benchmark Corpus of English Misspellings and a Minimally-supervised Model for Spelling Correction , 2019, BEA@ACL.

[16]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[17]  Nianwen Xue,et al.  CoNLL-2011 Shared Task: Modeling Unrestricted Coreference in OntoNotes , 2011, CoNLL Shared Task.

[18]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Min Zhang,et al.  Distantly Supervised NER with Partial Annotation Learning and Reinforcement Learning , 2018, COLING.

[20]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[21]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[22]  Xingrui Yu,et al.  How does Disagreement Help Generalization against Label Corruption? , 2019, ICML.

[23]  Nanyun Peng,et al.  Named Entity Recognition for Chinese Social Media with Jointly Trained Embeddings , 2015, EMNLP.

[24]  Mert R. Sabuncu,et al.  Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels , 2018, NeurIPS.

[25]  Dietrich Klakow,et al.  Feature-Dependent Confusion Matrices for Low-Resource NER Labeling with Noisy Labels , 2019, EMNLP.

[26]  Andrew McCallum,et al.  Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples , 2017, NIPS.

[27]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Aritra Ghosh,et al.  Robust Loss Functions under Label Noise for Deep Neural Networks , 2017, AAAI.

[29]  Farhad Nooralahzadeh,et al.  Reinforcement-based denoising of distantly supervised NER with partial annotation , 2019, EMNLP.

[30]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[31]  Kun Yi,et al.  Probabilistic End-To-End Noise Correction for Learning With Noisy Labels , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Hiroyuki Shindo,et al.  LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention , 2020, EMNLP.