Chinese Named Entity Recognition with a Sequence Labeling Approach: Based on Characters, or Based on Words?

Named Entity Recognition (NER), an important problem of Natural Language Processing, is the basis for other applications, such as Data Mining and Relation Extraction. With a sequence labeling approach, this paper wants to answer which kind of tokens that should be taken as the graininess in NER task, characters or words. Meanwhile, we use not only local context features within a sentence, but also global knowledge features extracting from other occurrences of each word in the whole corpus. The results show that without the global features the person names and the location names have good result based on characters, but the organization names are more suitable based on words. When global features are added, the performance of based on words improved significantly.

[1]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[2]  Hwee Tou Ng,et al.  Named Entity Recognition with a Maximum Entropy Approach , 2003, CoNLL.

[3]  Christopher D. Manning,et al.  An Effective Two-Stage Model for Exploiting Non-Local Dependencies in Named Entity Recognition , 2006, ACL.

[4]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[5]  B. M. Sundheim,et al.  Named entity task definition, version 2.1 , 1995 .

[6]  Thamar Solorio,et al.  Exploiting Named Entity Taggers in a Second Language , 2005, ACL.

[7]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[8]  Ralph Grishman,et al.  A Maximum Entropy Approach to Named Entity Recognition , 1999 .

[9]  Hwee Tou Ng,et al.  Named Entity Recognition: A Maximum Entropy Approach Using Global Information , 2002, COLING.

[10]  Nancy Chinchor,et al.  Appendix E: MUC-7 Named Entity Task Definition (version 3.5) , 1998, MUC.

[11]  Bogdan Babych,et al.  Improving Machine Translation Quality with Automatic Named Entity Recognition , 2003, Proceedings of the 7th International EAMT workshop on MT and other Language Technology Tools, Improving MT through other Language Technology Tools Resources and Tools for Building MT - EAMT '03.

[12]  Peter T. Corbett,et al.  Semantic enrichment of journal articles using chemical named entity recognition , 2007, ACL.

[13]  Rada Mihalcea,et al.  Document Indexing using Named Entities , 2001 .

[14]  Gideon S. Mann Fine-Grained Proper Noun Ontologies for Question Answering , 2002, COLING 2002.