论文信息 - Mongolian Named Entity Recognition using suffixes segmentation

Mongolian Named Entity Recognition using suffixes segmentation

Mongolian is an agglutinative language with the complex morphological structures. Building an accurate Named Entity Recognition (NER) system for Mongolian is a challenging and meaningful work. This paper analyzes the characteristic of Mongolian suffixes using Narrow Non-Break Space and investigates Mongolian NER system under three methods in the Condition Random Field framework. The experiment shows that segmenting each suffix into an individual token achieves the best performance than both without segmenting and using the suffixes as a feature. Our approach obtains an F-measure = 82.71. It is appropriate for the Mongolian large scale vocabulary NER. This research also makes sense to other agglutinative languages NER systems.

[1] Adam Kilgarriff,et al. of the European Chapter of the Association for Computational Linguistics , 2006 .

[2] Satoshi Sekine,et al. A survey of named entity recognition and classification , 2007 .

[3] Lluís Màrquez,et al. Proceedings of the Tenth Conference on Computational Natural Language Learning , 2006 .

[4] Sampo Pyysalo,et al. brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[5] GÃ¶khan AkÄ±n Åžeker,et al. Initial Explorations on using CRFs for Turkish Named Entity Recognition , 2012, Coling 2012.

[6] Zdenek Zabokrtský,et al. Czech Named Entity Corpus and SVM-based Recognizer , 2009, NEWS@IJCNLP.

[7] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[8] Guanglai Gao,et al. Segmentation-based Mongolian LVCSR approach , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9] Yassine Benajiba,et al. Arabic Named Entity Recognition: Using Features Extracted from Noisy Data , 2010, ACL.

[10] Yassine Benajiba,et al. Arabic Named Entity Recognition using Conditional Random Fields , 2008 .

[11] Thorsten Brants,et al. A Context Pattern Induction Method for Named Entity Extraction , 2006, CoNLL.

[12] Sadao Kurohashi,et al. Japanese Named Entity Recognition Using Structural Natural Language Processing , 2008, IJCNLP.