Transfer Learning from Automatically Annotated Data for Recognizing Named Entities in Recent Generated Texts

In recent years, the machine learning based systems make an impressive performance only the same distribution of training data and test data. It means that the systems sometimes do not perform well on newly generated data because new data have different distribution with existing training data. In particular, named entity recognition (NER) task often face this problem due to numerous newly created named entities over time. However it is challenging to annotate for every newly generated text data manually. Therefore we propose the method of reducing the cost of manual annotation for recently generated texts by using a large amount of unlabeled data. We first automatically recognize named entities in unlabeled data with a knowledge-base (KB). Automatic annotation for unstructured data costs less, but it has considerable noise because they are not a golden-standard data. To overcome this problem, we next apply a transfer learning approach that reduces the influence of noise from automatically annotated data. In our transfer learning approach, the automatically annotated data are used as data for pre-training the NER model, and then the model is finetuned with the existing manually annotated data. We evaluate our proposed method with three different distributions. Experimental results demonstrate that our approach improves the performance of NER in recent texts.

[1]  Kentaro Torisawa,et al.  Exploiting Wikipedia as External Knowledge for Named Entity Recognition , 2007, EMNLP.

[2]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[3]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[4]  Hannaneh Hajishirzi,et al.  Question Answering through Transfer Learning from Large Fine-grained Supervision Data , 2017, ACL.

[5]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[6]  Deniz Yuret,et al.  Transfer Learning for Low-Resource Neural Machine Translation , 2016, EMNLP.

[7]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[8]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[9]  Amit Gupta,et al.  Revisiting Taxonomy Induction over Wikipedia , 2016, COLING.

[10]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[11]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[12]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[13]  Steven Skiena,et al.  POLYGLOT-NER: Massive Multilingual Named Entity Recognition , 2014, SDM.

[14]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[15]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[16]  Joel Nothman,et al.  Analysing Wikipedia and Gold-Standard Corpora for NER Training , 2009, EACL.

[17]  Ruslan Salakhutdinov,et al.  Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks , 2016, ICLR.

[18]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[19]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[20]  Ralph Grishman,et al.  Distant Supervision for Relation Extraction with an Incomplete Knowledge Base , 2013, NAACL.

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.