One Class per Named Entity: Exploiting Unlabeled Text for Named Entity Recognition

In this paper, we present a simple yet novel method of exploiting unlabeled text to further improve the accuracy of a high-performance state-of-the art named entity recognition (NER) system. The method utilizes the empirical property that many named entities occur in one name class only. Using only unlabeled text as the additional resource, our improved NER system achieves an F1 score of 87.13%, an improvement of 1.17% in F1 score and a 8.3% error reduction on the CoNLL 2003 English NER official test set. This accuracy places our NER system among the top 3 systems in the CoNLL 2003 English shared task.

[1]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[2]  David Yarowsky,et al.  One Sense per Collocation , 1993, HLT.

[3]  Ronald Rosenfeld,et al.  Semi-supervised learning with graphs , 2005 .

[4]  David Yarowsky,et al.  Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence , 1999, EMNLP.

[5]  Ralph Grishman,et al.  A Maximum Entropy Approach to Named Entity Recognition , 1999 .

[6]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Tong Zhang,et al.  Named Entity Recognition through Classifier Combination , 2003, CoNLL.

[8]  Tong Zhang,et al.  A High-Performance Semi-Supervised Learning Method for Text Chunking , 2005, ACL.

[9]  Ieee Xplore,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  R. Lathe Phd by thesis , 1988, Nature.

[11]  Hwee Tou Ng,et al.  Named Entity Recognition with a Maximum Entropy Approach , 2003, CoNLL.

[12]  Hwee Tou Ng,et al.  Named Entity Recognition: A Maximum Entropy Approach Using Global Information , 2002, COLING.

[13]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[14]  Claire Cardie,et al.  Limitations of Co-Training for Natural Language Learning from Large Datasets , 2001, EMNLP.

[15]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[16]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[17]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.