Discriminative Online Algorithms for Sequence Labeling-A Comparative Study

We describe a natural alternative for training sequence labeling models, based on MIRA (Margin Infused Relaxed Algorithm). In addition, we describe a novel method for performing Viterbi-like decoding. We test MIRA and contrast it with other training algorithms and contrast our decoding algorithm with the vanilla Viterbi algorithm.

[1]  Y. Singer,et al.  Ultraconservative online algorithms for multiclass problems , 2003 .

[2]  Rob Malouf,et al.  Markov Models for Language-independent Named Entity Recognition , 2002, CoNLL.

[3]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[4]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[5]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[6]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[7]  Marine Carpuat,et al.  A Stacked, Voted, Stacked Model for Named Entity Recognition , 2003, CoNLL.

[8]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[9]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[10]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[11]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[12]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[13]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[14]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[15]  Thomas Hofmann,et al.  Investigating Loss Functions and Optimization Methods for Discriminative Learning of Label Sequences , 2003, EMNLP.

[16]  David Chiang,et al.  Better k-best Parsing , 2005, IWPT.

[17]  Dan Klein,et al.  Conditional Structure versus Conditional Estimation in NLP Models , 2002, EMNLP.

[18]  Walter Daelemans,et al.  MBT : Memory Based Tagger, version 1.0, Reference Guide , 2002 .