SpanProto: A Two-stage Span-based Prototypical Network for Few-shot Named Entity Recognition

Few-shot Named Entity Recognition (NER) aims to identify named entities with very little annotated data. Previous methods solve this problem based on token-wise classification, which ignores the information of entity boundaries, and inevitably the performance is affected by the massive non-entity tokens. To this end, we propose a seminal span-based prototypical network (SpanProto) that tackles few-shot NER via a two-stage approach, including span extraction and mention classification. In the span extraction stage, we transform the sequential tags into a global boundary matrix, enabling the model to focus on the explicit boundary information. For mention classification, we leverage prototypical learning to capture the semantic representations for each labeled span and make the model better adapt to novel-class entities. To further improve the model performance, we split out the false positives generated by the span extractor but not labeled in the current episode set, and then present a margin-based loss to separate them from each prototype region. Experiments over multiple benchmarks demonstrate that our model outperforms strong baselines by a large margin.

[1]  Minghui Qiu,et al.  KECP: Knowledge Enhanced Contrastive Prompting for Few-shot Extractive Question Answering , 2022, EMNLP.

[2]  Minghui Qiu,et al.  EasyNLP: A Comprehensive and Easy-to-use Toolkit for Natural Language Processing , 2022, EMNLP.

[3]  Yongliang Shen,et al.  Propose-and-Refine: A Two-Stage Set Prediction Network for Nested Named Entity Recognition , 2022, IJCAI.

[4]  T. Zhao,et al.  Decomposed Meta-Learning for Few-Shot Named Entity Recognition , 2022, FINDINGS.

[5]  D. Roth,et al.  Label Semantics for Few Shot Named Entity Recognition , 2022, FINDINGS.

[6]  Qingyu Zhou,et al.  An Enhanced Span-based Decomposition Method for Few-Shot Sequence Labeling , 2021, NAACL.

[7]  Sarkar Snigdha Sarathi Das,et al.  CONTaiNER: Few-Shot Named Entity Recognition via Contrastive Learning , 2021, ACL.

[8]  Haitao Zheng,et al.  Few-NERD: A Few-shot Named Entity Recognition Dataset , 2021, ACL.

[9]  Yongliang Shen,et al.  Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition , 2021, ACL.

[10]  Danqi Chen,et al.  SimCSE: Simple Contrastive Learning of Sentence Embeddings , 2021, EMNLP.

[11]  Aske Plaat,et al.  A survey of deep meta-learning , 2020, Artificial Intelligence Review.

[12]  Trevor Darrell,et al.  Rethinking preventing class-collapsing in metric learning with margin-based losses , 2020, IEEE International Conference on Computer Vision.

[13]  Baolin Peng,et al.  Few-Shot Named Entity Recognition: An Empirical Baseline Study , 2021, EMNLP.

[14]  Hongbo Xu,et al.  Adaptive Attentional Network for Few-Shot Knowledge Graph Completion , 2020, EMNLP.

[15]  Morteza Ziyadi,et al.  Example-Based Named Entity Recognition , 2020, ArXiv.

[16]  Zhihan Zhou,et al.  Few-shot Slot Tagging with Collapsed Dependency Transfer and Label-enhanced Task-adaptive Projection Network , 2020, ACL.

[17]  Juntao Yu,et al.  Named Entity Recognition as Dependency Parsing , 2020, ACL.

[18]  Jian Sun,et al.  Dynamic Memory Induction Networks for Few-Shot Text Classification , 2020, ACL.

[19]  James T. Kwok,et al.  Generalizing from a Few Examples , 2019, ACM Comput. Surv..

[20]  Kai Zou,et al.  EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks , 2019, EMNLP.

[21]  Varvara Logacheva,et al.  Few-shot classification in named entity recognition task , 2018, SAC.

[22]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[23]  Leon Derczynski,et al.  Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition , 2017, NUT@EMNLP.

[24]  Amir Zeldes,et al.  The GUM corpus: creating multilayer resources in the classroom , 2016, Language Resources and Evaluation.

[25]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[26]  Chandra Bhagavatula,et al.  Semi-supervised sequence tagging with bidirectional language models , 2017, ACL.

[27]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[28]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[29]  Yashar Mehdad,et al.  Domain Adaptation for Named Entity Recognition in Online Media with Word Embeddings , 2016, ArXiv.

[30]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[31]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[32]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[33]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[34]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[35]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[36]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.