CycleNER: An Unsupervised Training Approach for Named Entity Recognition

Named Entity Recognition (NER) is a crucial natural language understanding task for many down-stream tasks such as question answering and retrieval. Despite significant progress in developing NER models for multiple languages and domains, scaling to emerging and/or low-resource domains still remains challenging, due to the costly nature of acquiring training data. We propose CycleNER, an unsupervised approach based on cycle-consistency training that uses two functions: (i) sentence-to-entity – S2E and (ii) entity-to-sentence – E2S, to carry out the NER task. CycleNER does not require annotations but a set of sentences with no entity labels and another independent set of entity examples. Through cycle-consistency training, the output from one function is used as input for the other (e.g. S2E → E2S) to align the representation spaces of both functions and therefore enable unsupervised training. Evaluation on several domains comparing CycleNER against supervised and unsupervised competitors shows that CycleNER achieves highly competitive performance with only a few thousand input sentences. We demonstrate competitive performance against supervised models, achieving 73% of supervised performance without any annotations on CoNLL03, while significantly outperforming unsupervised approaches.

[1]  Asahi Ushio,et al.  T-NER: An All-Round Python Library for Transformer-based Named Entity Recognition , 2022, EACL.

[2]  Shervin Malmasi,et al.  GEMNET: Effective Gated Gazetteer Representations for Recognizing Complex Entities in Low-context Input , 2021, NAACL.

[3]  Zheng Zhang,et al.  Fork or Fail: Cycle-Consistent Training with Many-to-One Mappings , 2020, AISTATS.

[4]  Hiroyuki Shindo,et al.  LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention , 2020, EMNLP.

[5]  Weinan Zhang,et al.  CycleGT: Unsupervised Graph-to-Text and Text-to-Graph Generation via Cycle Training , 2020, WEBNLG.

[6]  Jiwei Li,et al.  Dice Loss for Data-imbalanced NLP Tasks , 2019, ACL.

[7]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[8]  Peter J. Liu,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[9]  Jan Hajic,et al.  Neural Architectures for Nested NER through Linearization , 2019, ACL.

[10]  Ping Li,et al.  Cycle-SUM: Cycle-consistent Adversarial LSTM Networks for Unsupervised Video Summarization , 2019, AAAI.

[11]  Angli Liu,et al.  Knowledge-Augmented Language Model and Its Application to Unsupervised Named-Entity Recognition , 2019, NAACL.

[12]  Shafiq R. Joty,et al.  Revisiting Adversarial Autoencoder for Unsupervised Word Translation with Cycle Consistency and Improved Training , 2019, NAACL.

[13]  Thomas Demeester,et al.  Adversarial training for multi-context joint entity and relation extraction , 2018, EMNLP.

[14]  Gholamreza Haffari,et al.  Iterative Back-Translation for Neural Machine Translation , 2018, NMT@ACL.

[15]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[16]  Guillaume Lample,et al.  Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[17]  Leon Derczynski,et al.  Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition , 2017, NUT@EMNLP.

[18]  Xuan Liu,et al.  Multi-view Response Selection for Human-Computer Conversation , 2016, EMNLP.

[19]  Daniel Marcu,et al.  Unsupervised Neural Hidden Markov Models , 2016, SPNLP@EMNLP.

[20]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[21]  Chris Dyer,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[22]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[23]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[24]  U. Leser,et al.  A survey on annotation tools for the biomedical literature , 2014, Briefings Bioinform..

[25]  Noémie Elhadad,et al.  Unsupervised biomedical named entity recognition: Experiments with clinical and biological texts , 2013, J. Biomed. Informatics.

[26]  Hang Li,et al.  Named entity recognition in query , 2009, SIGIR.

[27]  Richard Tzong-Han Tsai,et al.  Overview of BioCreative II gene mention recognition , 2008, Genome Biology.

[28]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[29]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[30]  Maryam Habibi,et al.  HUNER: improving biomedical NER with pretraining , 2020, Bioinform..

[31]  Weidong Xiao,et al.  Fine Grained Named Entity Recognition via Seq2seq Framework , 2020, IEEE Access.

[32]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[33]  Kai Labusch,et al.  BERT for Named Entity Recognition in Contemporary and Historic German , 2019, KONVENS.