Improving Sequence Tagging using Machine-Learning Techniques

This paper presents an excel sequence tagging approach based on the combined machine learning methods. Firstly, conditional random fields (CRF) is presented as a new kind of discriminative sequential model, it can incorporate many rich features, and well avoid the label bias problem that is the limitation of maximum entropy Markov models (MEMM) and other discriminative finite-state models. Secondly, support vector machine is improved to adapt the sequential tagging task. Finally, these improved models and other existing models are combined together, which have achieved the state-of-the-art performance. Experimental results show that CRF approach achieves 0.70% improvement in POS tagging and 0.67% improvement in shallow parsing. Moreover, our combination method achieves F-measure 93.73% and 93.69% in above two tasks respectively, which is better than any sub-model

[1]  Jian Zhao,et al.  A maximum entropy Markov model for chunking , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[2]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[3]  Julian M. Kupiec,et al.  Robust part-of-speech tagging using a hidden Markov model , 1992 .

[4]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[5]  Wei Jiang,et al.  Chinese Word Segmentation based on Mixing Model , 2005, SIGHAN@IJCNLP 2005.

[6]  Xiaolong Wang,et al.  Conditional Random Fields Based Label Sequence and Information Feedback , 2006, ICIC.

[7]  Lluís Màrquez i Villodre,et al.  SVMTool: A general POS Tagger Generator Based on Support Vector Machines , 2004, LREC.

[8]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[9]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[10]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[11]  Tianshun Yao,et al.  Applying Conditional Random Fields to Chinese Shallow Parsing , 2005, CICLing.

[12]  Josep Carmona,et al.  Improving POS Tagging Using Machine-Learning Techniques , 1999, EMNLP.

[13]  Yuji Matsumoto,et al.  Use of Support Vector Learning for Chunk Identification , 2000, CoNLL/LLL.