Using Non-Local Features to Improve Named Entity Recognition Recall

Named Entity Recognition (NER) is always limited by its lower recall resulting from the asymmetric data distribution where the NONE class dominates the entity classes. This paper presents an approach that exploits non-local information to improve the NER recall. Several kinds of non-local features encoding entity token occurrence, entity boundary and entity class are explored under Conditional Random Fields (CRFs) framework. Experiments on SIGHAN 2006 MSRA (CityU) corpus indicate that non-local features can effectively enhance the recall of the state-of-the-art NER systems. Incorporating the non-local features into the NER systems using local features alone, our best system achieves a 23.56% (25.26%) relative error reduction on the recall and 17.10% (11.36%) relative error reduction on the F1 score; the improved F1 score 89.38% (90.09%) is significantly superior to the best NER system with F1 of 86.51% (89.03%) participated in the closed track.

[1]  Andrew McCallum,et al.  Collective Segmentation and Labeling of Distant Entities in Information Extraction , 2004 .

[2]  Razvan C. Bunescu,et al.  Collective Information Extraction with Relational Markov Networks , 2004, ACL.

[3]  Christopher D. Manning,et al.  An Effective Two-Stage Model for Exploiting Non-Local Dependencies in Named Entity Recognition , 2006, ACL.

[4]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[5]  Nanda Kambhatla Minority Vote: At-Least-N Voting Improves Recall for Extracting Relations , 2006, ACL.

[6]  Jorge Nocedal,et al.  Representations of quasi-Newton matrices and their use in limited memory methods , 1994, Math. Program..

[7]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[8]  Gina-Anne Levow,et al.  The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition , 2006, SIGHAN@COLING/ACL.

[9]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[10]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[11]  Hwee Tou Ng,et al.  One Class per Named Entity: Exploiting Unlabeled Text for Named Entity Recognition , 2007, IJCAI.

[12]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.