A Robust and Domain-Adaptive Approach for Low-Resource Named Entity Recognition

Recently, it has attracted much attention to build reliable named entity recognition (NER) systems using limited annotated data. Nearly all existing works heavily rely on domain-specific resources, such as external lexicons and knowledge bases. However, such domain-specific resources are often not available, meanwhile it’s difficult and expensive to construct the resources, which has become a key obstacle to wider adoption. To tackle the problem, in this work, we propose a novel robust and domain-adaptive approach RDANER for low-resource NER, which only uses cheap and easily obtainable resources. Extensive experiments on three benchmark datasets demonstrate that our approach achieves the best performance when only using cheap and easily obtainable resources, and delivers competitive results against state-of-the-art methods which use difficultly obtainable domainspecific resources. All our code and corpora can be found on https://github.com/houking-can/RDANER.

[1]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[2]  Zhiyuan Liu,et al.  Learning Entity and Relation Embeddings for Knowledge Graph Completion , 2015, AAAI.

[3]  Jaime G. Carbonell,et al.  A Little Annotation does a Lot of Good: A Study in Bootstrapping Low-resource Named Entity Recognizers , 2019, EMNLP.

[4]  Iz Beltagy,et al.  SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.

[5]  Zhiyong Lu,et al.  BioCreative V CDR task corpus: a resource for chemical disease relation extraction , 2016, Database J. Biol. Databases Curation.

[6]  Zhiyong Lu,et al.  NCBI disease corpus: A resource for disease name recognition and concept normalization , 2014, J. Biomed. Informatics.

[7]  Hannaneh Hajishirzi,et al.  Entity, Relation, and Event Extraction with Contextualized Span Representations , 2019, EMNLP.

[8]  Claire Cardie,et al.  Extracting Opinion Expressions with semi-Markov Conditional Random Fields , 2012, EMNLP.

[9]  Zaiqing Nie,et al.  Joint Entity Recognition and Disambiguation , 2015, EMNLP.

[10]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[11]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[12]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[13]  Rick Siow Mong Goh,et al.  Dual Adversarial Neural Transfer for Low-Resource Named Entity Recognition , 2019, ACL.

[14]  Xu Sun,et al.  A Unified Model for Cross-Domain and Semi-Supervised Named Entity Recognition in Chinese Social Media , 2017, AAAI.

[15]  Heng Ji,et al.  Joint bilingual name tagging for parallel corpora , 2012, CIKM '12.

[16]  Mari Ostendorf,et al.  Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction , 2018, EMNLP.

[17]  Heng Ji,et al.  CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases , 2016, WWW.

[18]  Shobeir Fakhraei,et al.  Biomedical Named Entity Recognition via Reference-Set Augmented Bootstrapping , 2019, ArXiv.

[19]  Jian Ni,et al.  Weakly Supervised Cross-Lingual Named Entity Recognition via Effective Annotation and Representation Projection , 2017, ACL.

[20]  Chengqing Zong,et al.  On Jointly Recognizing and Aligning Bilingual Named Entities , 2010, ACL.

[21]  Christopher Ré,et al.  SwellShark: A Generative Model for Biomedical Named Entity Recognition without Labeled Data , 2017, ArXiv.

[22]  Teng Ren,et al.  Learning Named Entity Tagger using Domain-Specific Dictionary , 2018, EMNLP.

[23]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[24]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[25]  Zhiyuan Liu,et al.  Low-Resource Name Tagging Learned with Weakly Labeled Data , 2019, EMNLP.

[26]  Xiang Ren,et al.  Empower Sequence Labeling with Task-Aware Neural Language Model , 2017, AAAI.

[27]  Ruslan Salakhutdinov,et al.  Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks , 2016, ICLR.

[28]  Angli Liu,et al.  Knowledge-Augmented Language Model and Its Application to Unsupervised Named-Entity Recognition , 2019, NAACL.

[29]  Adrian Ulges,et al.  Span-based Joint Entity and Relation Extraction with Transformer Pre-training , 2020, ECAI.