Named Entity Recognition with Partially Annotated Training Data

Supervised machine learning assumes the availability of fully-labeled data, but in many cases, such as low-resource languages, the only data available is partially annotated. We study the problem of Named Entity Recognition (NER) with partially annotated training data in which a fraction of the named entities are labeled, and all other tokens, entities or otherwise, are labeled as non-entity by default. In order to train on this noisy dataset, we need to distinguish between the true and false negatives. To this end, we introduce a constraint-driven iterative algorithm that learns to detect false negatives in the noisy set and downweigh them, resulting in a weighted training set. With this set, we train a weighted NER model. We evaluate our algorithm with weighted variants of neural and non-neural NER models on data in 8 languages from several language and script families, showing strong ability to learn from partial data. Finally, to show real-world efficacy, we evaluate on a Bengali NER corpus annotated by non-speakers, outperforming the prior state-of-the-art by over 5 points F1.

[1]  Kalina Bontcheva,et al.  Generalisation in named entity recognition: A quantitative analysis , 2017, Comput. Speech Lang..

[2]  Graham Neubig,et al.  Pointwise Prediction and Sequence-Based Reranking for Adaptable Part-of-Speech Tagging , 2015, PACLING.

[3]  Stephen D. Mayhew,et al.  CogCompNLP: Your Swiss Army Knife for NLP , 2018, LREC.

[4]  O. Chapelle,et al.  Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews] , 2009, IEEE Transactions on Neural Networks.

[5]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[6]  Christopher Ré,et al.  Snorkel: Rapid Training Data Creation with Weak Supervision , 2017, Proc. VLDB Endow..

[7]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[8]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[9]  Dietrich Klakow,et al.  Training a Neural Network in a Low-Resource Setting on Automatically Annotated Noisy Data , 2018, DeepLo@ACL.

[10]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[11]  Stephen D. Mayhew,et al.  TALEN: Tool for Annotation of Low-resource ENtities , 2018, ACL.

[12]  Bing Liu,et al.  Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression , 2003, ICML.

[13]  Joel Nothman,et al.  Transforming Wikipedia into Named Entity Training Data , 2008, ALTA.

[14]  Dan Roth,et al.  Incidental Supervision: Moving beyond Supervised Learning , 2017, AAAI.

[15]  Graham Neubig,et al.  A Pointwise Approach to Training Dependency Parsers from Partially Annotated Corpora , 2012 .

[16]  Bernhard Schölkopf,et al.  Fidelity-Weighted Learning , 2017, ICLR.

[17]  Christopher Ré,et al.  SwellShark: A Generative Model for Biomedical Named Entity Recognition without Labeled Data , 2017, ArXiv.

[18]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[19]  Jacob Goldberger,et al.  Training deep neural-networks using a noise adaptation layer , 2016, ICLR.

[20]  Stephen D. Mayhew,et al.  Cross-Lingual Named Entity Recognition via Wikification , 2016, CoNLL.

[21]  Iñaki Inza,et al.  Weak supervision and other non-standard classification problems: A taxonomy , 2016, Pattern Recognit. Lett..

[22]  Philip S. Yu,et al.  Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.

[23]  Dan Roth,et al.  Exploiting Partially Annotated Data for Temporal Relation Extraction , 2018, *SEM@NAACL-HLT.

[24]  Stephen D. Mayhew,et al.  Cheap Translation for Cross-Lingual Named Entity Recognition , 2017, EMNLP.

[25]  Wei Lu,et al.  Better Modeling of Incomplete Annotations for Named Entity Recognition , 2019, NAACL.

[26]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[27]  Kevin Knight,et al.  Out-of-the-box Universal Romanization Tool uroman , 2018, ACL.

[28]  Stephanie Strassel,et al.  LORELEI Language Packs: Data, Tools, and Resources for Technology Development in Low Resource Languages , 2016, LREC.

[29]  Yuji Matsumoto,et al.  Training Conditional Random Fields Using Incomplete Annotations , 2008, COLING.

[30]  Philip S. Yu,et al.  Partially Supervised Classification of Text Documents , 2002, ICML.

[31]  Eraldo Rezende Fernandes,et al.  Learning from Partially Annotated Sequences , 2011, ECML/PKDD.

[32]  Ming-Wei Chang,et al.  Guiding Semi-Supervision with Constraint-Driven Learning , 2007, ACL.

[33]  Dirk Hovy,et al.  Exploiting Partial Annotations with EM Training , 2012, HLT-NAACL 2012.

[34]  Heng Ji,et al.  Cross-lingual Name Tagging and Linking for 282 Languages , 2017, ACL.

[35]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[36]  Joel Nothman,et al.  Learning multilingual named entity recognition from Wikipedia , 2013, Artif. Intell..

[37]  Edouard Grave,et al.  Weakly supervised named entity classification , 2014 .

[38]  Heng Ji,et al.  Name Tagging for Low-resource Incident Languages based on Expectation-driven Learning , 2016, HLT-NAACL.