Transfer Learning for Named Entity Recognition in Historical Corpora

We report on our participation to the 2020 CLEF HIPE shared task as team Ehrmama, focusing on bundle 3: Named Entity Recognition and Classification (NERC) on coarse and fine-grained tags. Motivated by an interest to assess the added value of transfer learning for NERC on historical corpora, we propose an architecture made of two components: (i) a modular embedding layer where we combine newly trained and pre-trained embeddings, and (ii) a task-specific BiLSTM-CRF layer. We find that character-level embeddings, BERT, and a document-level data split are the most important factors in improving our results. We also find that using in-domain FastText embeddings and a single-task as opposed to multi-task approach yields minor gains. Our results confirm that pre-trained language models can be beneficial for NERC on low-resourced historical corpora.

[1]  Xuanjing Huang,et al.  How to Fine-Tune BERT for Text Classification? , 2019, CCL.

[2]  Simon Clematide,et al.  Extended Overview of CLEF HIPE 2020: Named Entity Processing on Historical Newspapers , 2020, CLEF.

[3]  Johan Bos,et al.  The Groningen Meaning Bank , 2013, JSSP.

[4]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[5]  Alexandros Potamianos,et al.  An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models , 2019, NAACL.

[6]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[7]  Philippe Langlais,et al.  Robust Lexical Features for Improved Neural Network Named-Entity Recognition , 2018, COLING.

[8]  Frédéric Kaplan,et al.  Diachronic Evaluation of NER Systems on Old Newspapers , 2016, KONVENS.

[9]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[10]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[11]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[12]  Michael Piotrowski,et al.  Natural Language Processing for Historical Texts , 2012, Synthesis Lectures on Human Language Technologies.

[13]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[14]  Sebastian Ruder,et al.  Neural transfer learning for natural language processing , 2019 .

[15]  Frank Hutter,et al.  Fixing Weight Decay Regularization in Adam , 2017, ArXiv.

[16]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.