论文信息 - Research on recognition of semantic chunk boundary in Tibetan

Research on recognition of semantic chunk boundary in Tibetan

Semantic chunk is able to well describe the sentence semantic framework. It plays a very important role in Natural Language Processing applications, such as machine translation, QA system and so on. At present, the Tibetan chunk researches are mainly based on rule-methods. In this paper, according to the distinctive language characteristics of Tibetan, we firstly put forward the descriptive definition of the Tibetan semantic chunk and its labeling scheme and then we propose a feature selection algorithm to select the suitable ones automatically from the candidate feature-templates. Through the experiment conducted on the two different kinds of Tibetan corpus, namely corpus-sentence and corpus-discourse, the F-Measure achieves 95.84%, 94.95% and 91.97%, 88.82% by using of Conditional Random Fields (CRF) model and Maximum Entropy (ME) model respectively. The positive results show that the definition of Tibetan semantic chunk in this paper is reasonable and operable. Furthermore, its boundary recognition is feasible and effective via statistical techniques in small scale corpus.

Heyan Huang | Shumin Shi | Tianhang Wang | Congjun Long | Ruijing Li

[1] Chang Baobao. Semantic Role Classification Based on Peking University Chinese NetBank , 2011 .

[2] Adwait Ratnaparkhi,et al. A Simple Introduction to Maximum Entropy Models for Natural Language Processing , 1997 .

[3] Zhou Qiang. CHUNK PARSING SCHEME FOR CHINESE SENTENCES , 1999 .

[4] Huang Heyan. A Survey on Chinese Chunk Parsing , 2013 .

[5] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[6] Steven Abney,et al. Parsing By Chunks , 1991 .

[7] Huang De-gen. Chinese Functional Chunk Parsing Employing CRF and Semantic Information , 2011 .

[8] Li Jihong,et al. Automatic Labeling of Chinese Functional Chunks Based on Conditional Random Fields Model , 2010 .

[9] John D. Lafferty,et al. Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[10] Li Yumei. Chinese Chunk Parsing Evaluation Tasks , 2010 .

[11] Yuan Yu-lin. The Fineness Hierarchy of Semantic Roles and its Application in NLP , 2007 .

[12] Di Jiang,et al. Tibetan Word Segmentation Based on Word-Position Tagging , 2013, 2013 International Conference on Asian Language Processing.

[13] Li Li. Tibetan Functional Chunks Boundary Detection , 2013 .

[14] Di Jiang,et al. The Comparative Research on the Segmentation Strategies of Tibetan Bounded-Variant Forms , 2013, 2013 International Conference on Asian Language Processing.