A Realistic Study of Auto-regressive Language Models for Named Entity Typing and Recognition

Despite impressive results of language models for named entity recognition (NER), their generalization to varied textual genres, a growing entity type set, and new entities remains a challenge. Collecting thousands of annotations in each new case for training or finetuning is expensive and time-consuming. In contrast, humans can easily identify named entities given some simple instructions. Inspired by this, we challenge the reliance on large datasets and study pre-trained language models for NER in a meta-learning setup. First, we test named entity typing (NET) in a zeroshot transfer scenario. Then, we perform NER by giving few examples at inference. We propose a method to select seen and rare / unseen names when having access only to the pre-trained model and report results on these groups. The results show: auto-regressive language models as meta-learners can perform NET and NER fairly well especially for regular or seen names; name irregularity when often present for a certain entity type can become an effective exploitable cue; names with words foreign to the model have the most negative impact on results; the model seems to rely more on name than context cues in few-shot NER.

[1]  P. Kobeleva Second Language Listening and Unfamiliar Proper Names: Comprehension Barrier? , 2012 .

[2]  Xiao Huang,et al.  TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition , 2020, ACL.

[3]  Xuanjing Huang,et al.  Rethinking Generalization of Neural Models: A Named Entity Recognition Case Study , 2020, AAAI.

[4]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[5]  Teng Ren,et al.  Learning Named Entity Tagger using Domain-Specific Dictionary , 2018, EMNLP.

[6]  Vincent Guigue,et al.  Contextualized Embeddings in Named-Entity Recognition: An Empirical Study on Generalization , 2019, ECIR.

[7]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[8]  Marco Guerini,et al.  Toward zero-shot Entity Recognition in Task-oriented Conversational Agents , 2018, SIGDIAL Conference.

[9]  Steven Bethard,et al.  A Survey on Recent Advances in Named Entity Recognition from Deep Learning models , 2018, COLING.

[10]  Jiwei Li,et al.  A Unified MRC Framework for Named Entity Recognition , 2019, ACL.

[11]  Bowen Yu,et al.  Coarse-to-Fine Pre-training for Named Entity Recognition , 2020, EMNLP.

[12]  Thamar Solorio,et al.  A Multi-task Approach for Named Entity Recognition in Social Media Data , 2017, NUT@EMNLP.

[13]  Leon Derczynski,et al.  Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition , 2017, NUT@EMNLP.

[14]  James R. Glass,et al.  A Conversational Movie Search System Based on Conditional Random Fields , 2012, INTERSPEECH.

[15]  Fabio A. González,et al.  Modeling Noisiness to Recognize Named Entities using Multitask Neural Networks on Social Media , 2018, NAACL.

[16]  Lori Markson,et al.  Capacities underlying word learning , 1998, Trends in Cognitive Sciences.

[17]  Dan Klein,et al.  Calibrate Before Use: Improving Few-Shot Performance of Language Models , 2021, ICML.

[18]  Xianpei Han,et al.  A Rigourous Study on Named Entity Recognition: Can Fine-tuning Pretrained Model Lead to the Promised Land? , 2020, ArXiv.

[19]  Pierre Lison,et al.  Named Entity Recognition without Labelled Data: A Weak Supervision Approach , 2020, ACL.

[20]  Úlfar Erlingsson,et al.  The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks , 2018, USENIX Security Symposium.

[21]  Roland Vollgraf,et al.  Pooled Contextualized Embeddings for Named Entity Recognition , 2019, NAACL.

[22]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[23]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[24]  Hector J. Levesque,et al.  On our best behaviour , 2014, Artif. Intell..

[25]  Erik Cambria,et al.  Label Embedding for Zero-shot Fine-grained Named Entity Typing , 2016, COLING.

[26]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.