A crowd-efficient learning approach for NER based on online encyclopedia

Named Entity Recognition (NER) is a core task of NLP. State-of-art supervised NER models rely heavily on a large amount of high-quality annotated data, which is quite expensive to obtain. Various existing ways have been proposed to reduce the heavy reliance on large training data, but only with limited effect. In this paper, we propose a crowd-efficient learning approach for supervised NER learning by making full use of the online encyclopedia pages. In our approach, we first define three criteria (representativeness, informativeness, diversity) to help select a much smaller set of samples for crowd labeling. We then propose a data augmentation method, which could generate a lot more training data with the help of the structured knowledge of online encyclopedia to greatly augment the training effect. After conducting model training on the augmented sample set, we re-select some new samples for crowd labeling for model refinement. We perform the training and selection procedure iteratively until the model could not be further improved or the performance of the model meets our requirement. Our empirical study conducted on several real data collections shows that our approach could reduce 50% manual annotations with almost the same NER performance as the fully trained model.

[1]  Maryam Habibi,et al.  Deep learning with word embeddings improves biomedical named entity recognition , 2017, Bioinform..

[2]  Jian Ni,et al.  Improving Multilingual Named Entity Recognition with Wikipedia Entity Type Mapping , 2016, EMNLP.

[3]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[4]  Xiao-Li Meng,et al.  The Art of Data Augmentation , 2001 .

[5]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[6]  Zhuowen Tu,et al.  Learning to Predict from Crowdsourced Data , 2014, UAI.

[7]  Eric K. Ringger,et al.  Early Gains Matter: A Case for Preferring Generative over Discriminative Crowdsourcing Models , 2015, NAACL.

[8]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[9]  Diyi Yang,et al.  That’s So Annoying!!!: A Lexical and Frame-Semantic Embedding Based Data Augmentation Approach to Automatic Categorization of Annoying Behaviors using #petpeeve Tweets , 2015, EMNLP.

[10]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[11]  Zhi Jin,et al.  Improved relation classification by deep recurrent neural networks with data augmentation , 2016, COLING.

[12]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[13]  Joel Nothman,et al.  Learning multilingual named entity recognition from Wikipedia , 2013, Artif. Intell..

[14]  Chandra Bhagavatula,et al.  Semi-supervised sequence tagging with bidirectional language models , 2017, ACL.

[15]  Wei Zhang,et al.  Adversarial Learning for Chinese NER from Crowd Annotations , 2018, AAAI.

[16]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[17]  Bernardete Ribeiro,et al.  Sequence labeling with multiple annotators , 2013, Machine Learning.

[18]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[19]  Koby Crammer,et al.  Sequence Learning from Data with Multiple Labels , 2009 .

[20]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[21]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Doug Downey,et al.  WebSAIL wikifier at ERD 2014 , 2014, ERD '14.

[23]  Rui Yan,et al.  How Transferable are Neural Networks in NLP Applications? , 2016, EMNLP.

[24]  Patrick Schone,et al.  Mining Wiki Resources for Multilingual Named Entity Recognition , 2008, ACL.

[25]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[26]  Jr. G. Forney,et al.  Viterbi Algorithm , 1973, Encyclopedia of Machine Learning.

[27]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[28]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[29]  Mark D. McDonnell,et al.  Understanding Data Augmentation for Classification: When to Warp? , 2016, 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[30]  Kalina Bontcheva,et al.  SVM Based Learning System for Information Extraction , 2004, Deterministic and Statistical Methods in Machine Learning.

[31]  Gina-Anne Levow,et al.  The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition , 2006, SIGHAN@COLING/ACL.

[32]  Steven Bethard,et al.  A Survey on Recent Advances in Named Entity Recognition from Deep Learning models , 2018, COLING.

[33]  Xiaoyong Du,et al.  Analogical Reasoning on Chinese Morphological and Semantic Relations , 2018, ACL.

[34]  Jian Su,et al.  Named Entity Recognition using an HMM-based Chunk Tagger , 2002, ACL.

[35]  Matthew Lease,et al.  Aggregating and Predicting Sequence Labels from Crowd Annotations , 2017, ACL.

[36]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[37]  Lora Aroyo,et al.  Crowdsourcing Ground Truth for Medical Relation Extraction , 2017, ACM Trans. Interact. Intell. Syst..

[38]  Anima Anandkumar,et al.  Deep Active Learning for Named Entity Recognition , 2017, Rep4NLP@ACL.