Character Tagging-Based Word Segmentation for Uyghur

For effectively obtain information in Uyghur words, we present a novel method based on character tagging for Uyghur word segmentation. In this paper, we suggest five labels for characters in a Uyghur word, include: Su, Bu, Iu, Eu and Au, according to our method, we segment Uyghur words as a sequence labeling procedure, which use Conditional Random Fields (CRFs) as the basic labeling model. Experimental show that our method collect more features in Uyghur words, therefore outperform several traditional used word segmentation models significantly.

[1]  Peter J. Haug,et al.  Bmc Medical Informatics and Decision Making Automation of a Problem List Using Natural Language Processing , 2005 .

[2]  Wang Lei Unsupervised Uyghur word segmentation method based on affix corpus , 2011 .

[3]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[4]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[5]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[6]  Deepti Chopra,et al.  NERHMM: A Tool for Named Entity Recognition Based on Hidden Markov Model , 2013 .

[7]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[8]  Zou Yue-li Uyghur event-anchored temporal expressions recognition using stemming method , 2014 .

[9]  Adwait Ratnaparkhi,et al.  A Simple Introduction to Maximum Entropy Models for Natural Language Processing , 1997 .

[10]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[11]  Hanna M. Wallach,et al.  Conditional Random Fields: An Introduction , 2004 .

[12]  Wu Xiaochuan Uyghur noun stemming system based on hybrid method , 2013 .

[13]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[14]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[15]  Deepti Chopra,et al.  Named Entity Recognition using Hidden Markov Model (HMM) , 2012 .