An Integrated Deterministic and Nondeterministic Inference Algorithm for Sequential Labeling

In this paper, we present a new search algorithm for sequential labeling tasks based on the conditional Markov models (CMMs) frameworks. Unlike conventional beam search, our method traverses all possible incoming arcs and also considers the “local best” so-far of each previous node. Furthermore, we propose two heuristics to fit the efficiency requirement. To demonstrate the effect of our method, six variant and large-scale sequential labeling tasks were conducted in the experiment. In addition, we compare our method to Viterbi and Beam search approaches. The experimental results show that our method yields not only substantial improvement in runtime efficiency, but also slightly better accuracy. In short, our method achieves 94.49 F(β) rate in the well-known CoNLL-2000 chunking task.

[1]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[2]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[3]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[4]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[5]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[6]  Jun'ichi Tsujii,et al.  Evaluation and Extension of Maximum Entropy Models with Inequality Constraints , 2003, EMNLP.

[7]  Yuji Matsumoto,et al.  Chunking with Support Vector Machines , 2001, NAACL.

[8]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[9]  Nigel Collier,et al.  Use of Support Vector Machines in Extended Named Entity Recognition , 2002, CoNLL.

[10]  Daniel Marcu,et al.  Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.

[11]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[12]  Hwee Tou Ng,et al.  Chinese Part-of-Speech Tagging: One-at-a-Time or All-at-Once? Word-Based or Character-Based? , 2004, EMNLP.

[13]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[14]  Stephen Clark,et al.  Chinese Segmentation with a Word-Based Perceptron Algorithm , 2007, ACL.

[15]  Lluís Màrquez i Villodre,et al.  SVMTool: A general POS Tagger Generator Based on Support Vector Machines , 2004, LREC.

[16]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[17]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[18]  Yue-Shi Lee,et al.  Robust and efficient multiclass SVM models for phrase pattern recognition , 2008, Pattern Recognit..

[19]  Jun Suzuki,et al.  Semi-Supervised Sequential Labeling and Segmentation Using Giga-Word Scale Unlabeled Data , 2008, ACL.

[20]  Jun Suzuki,et al.  Semi-Supervised Structured Output Learning Based on a Hybrid Generative and Discriminative Approach , 2007, EMNLP.