论文信息 - Chinese Named Entity Identification Using Class-based Language Model

Chinese Named Entity Identification Using Class-based Language Model

We consider here the problem of Chinese named entity (NE) identification using statistical language model(LM). In this research, word segmentation and NE identification have been integrated into a unified framework that consists of several class-based language models. We also adopt a hierarchical structure for one of the LMs so that the nested entities in organization names can be identified. The evaluation on a large test set shows consistent improvements. Our experiments further demonstrate the improvement after seamlessly integrating with linguistic heuristic information, cache-based model and NE abbreviation identification.

Lei Zhang | Changning Huang | Jian Sun | Jianfeng Gao | Ming Zhou

[1] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.

[2] Shuanhu Bai,et al. Description of the Kent Ridge Digital Labs System Used for MUC-7 , 1998, MUC.

[3] Richard M. Schwartz,et al. An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[4] Nancy Chinchor,et al. Appendix E: MUC-7 Named Entity Task Definition (version 3.5) , 1998, MUC.

[5] Ralph Grishman,et al. A Decision Tree Method for Finding and Classifying Names in Japanese Texts , 1998, VLC@COLING/ACL.

[6] Marc Moens,et al. Description of the LTG System Used for MUC-7 , 1998, MUC.

[7] Jianfeng Gao,et al. Toward a unified approach to statistical language modeling for Chinese , 2002, TALIP.

[8] Renato De Mori,et al. A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[9] Hsin-Hsi Chen,et al. Description of the NTU System used for MET-2 , 1998, MUC.

[10] Ralph Grishman,et al. A Maximum Entropy Approach to Named Entity Recognition , 1999 .