Probing Pre-trained Auto-regressive Language Models for Named Entity Typing and Recognition

Multiple works have proposed to probe language models (LMs) for generalization in named entity (NE) typing (NET) and recognition (NER). However, little has been done in this direction for auto-regressive models despite their popularity and potential to express a wide variety of NLP tasks in the same unified format. We propose a new methodology to probe auto-regressive LMs for NET and NER generalization, which draws inspiration from human linguistic behavior, by resorting to meta-learning. We study NEs of various types individually by designing a zero-shot transfer strategy for NET. Then, we probe the model for NER by providing a few examples at inference. We introduce a novel procedure to assess the model’s memorization of NEs and report the memorization’s impact on the results. Our findings show that: 1) GPT2, a common pre-trained auto-regressive LM, without any fine-tuning for NET or NER, performs the tasks fairly well; 2) name irregularity when common for a NE type could be an effective exploitable cue; 3) the model seems to rely more on NE than contextual cues in few-shot NER; 4) NEs with words absent during LM pre-training are very challenging for both NET and NER.

[1]  Erik Cambria,et al.  Label Embedding for Zero-shot Fine-grained Named Entity Typing , 2016, COLING.

[2]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[3]  Vincent Guigue,et al.  Contextualized Embeddings in Named-Entity Recognition: An Empirical Study on Generalization , 2019, ECIR.

[4]  Rami Aly,et al.  Leveraging Type Descriptions for Zero-shot Named Entity Recognition and Classification , 2021, ACL.

[5]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[6]  Xiao Huang,et al.  TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition , 2020, ACL.

[7]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[8]  Jiwei Li,et al.  A Unified MRC Framework for Named Entity Recognition , 2019, ACL.

[9]  Rachel Rudinger,et al.  “You Are Grounded!”: Latent Name Artifacts in Pre-trained Language Models , 2020, EMNLP.

[10]  Marco Guerini,et al.  Toward zero-shot Entity Recognition in Task-oriented Conversational Agents , 2018, SIGDIAL Conference.

[11]  Teng Ren,et al.  Learning Named Entity Tagger using Domain-Specific Dictionary , 2018, EMNLP.

[12]  Úlfar Erlingsson,et al.  The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks , 2018, USENIX Security Symposium.

[13]  Bowen Yu,et al.  Coarse-to-Fine Pre-training for Named Entity Recognition , 2020, EMNLP.

[14]  Philip Yu,et al.  MZET: Memory Augmented Zero-Shot Fine-grained Named Entity Typing , 2020, COLING.

[15]  D. Klein,et al.  Calibrate Before Use: Improving Few-Shot Performance of Language Models , 2021, ICML.

[16]  Thamar Solorio,et al.  A Multi-task Approach for Named Entity Recognition in Social Media Data , 2017, NUT@EMNLP.

[17]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[18]  Hector J. Levesque,et al.  On our best behaviour , 2014, Artif. Intell..

[19]  Steven Bethard,et al.  A Survey on Recent Advances in Named Entity Recognition from Deep Learning models , 2018, COLING.

[20]  Leon Derczynski,et al.  Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition , 2017, NUT@EMNLP.

[21]  Frank F. Xu,et al.  How Can We Know What Language Models Know? , 2019, Transactions of the Association for Computational Linguistics.

[22]  Dan Roth,et al.  Zero-Shot Open Entity Typing as Type-Compatible Grounding , 2019, EMNLP.

[23]  Baolin Peng,et al.  Few-Shot Named Entity Recognition: An Empirical Baseline Study , 2021, EMNLP.

[24]  Fabio A. González,et al.  Modeling Noisiness to Recognize Named Entities using Multitask Neural Networks on Social Media , 2018, NAACL.

[25]  James R. Glass,et al.  A Conversational Movie Search System Based on Conditional Random Fields , 2012, INTERSPEECH.

[26]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[27]  Haitao Zheng,et al.  Few-NERD: A Few-shot Named Entity Recognition Dataset , 2021, ACL.

[28]  Pierre Lison,et al.  Named Entity Recognition without Labelled Data: A Weak Supervision Approach , 2020, ACL.

[29]  Xuanjing Huang,et al.  Rethinking Generalization of Neural Models: A Named Entity Recognition Case Study , 2020, AAAI.

[30]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[31]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[32]  Xianpei Han,et al.  A Rigourous Study on Named Entity Recognition: Can Fine-tuning Pretrained Model Lead to the Promised Land? , 2020, ArXiv.

[33]  Roland Vollgraf,et al.  Pooled Contextualized Embeddings for Named Entity Recognition , 2019, NAACL.