论文信息 - Discriminative Online Algorithms for Sequence Labeling-A Comparative Study

Discriminative Online Algorithms for Sequence Labeling-A Comparative Study

We describe a natural alternative for training sequence labeling models, based on MIRA (Margin Infused Relaxed Algorithm). In addition, we describe a novel method for performing Viterbi-like decoding. We test MIRA and contrast it with other training algorithms and contrast our decoding algorithm with the vanilla Viterbi algorithm.

Shay B. Cohen | Kevin Gimpel | Shay Cohen | Kevin Gimpel

[1] Y. Singer,et al. Ultraconservative online algorithms for multiclass problems , 2003 .

[2] Rob Malouf,et al. Markov Models for Language-independent Named Entity Recognition , 2002, CoNLL.

[3] Dan Klein,et al. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[4] Fernando Pereira,et al. Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[5] Michael Collins,et al. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[6] Adwait Ratnaparkhi,et al. A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[7] Marine Carpuat,et al. A Stacked, Voted, Stacked Model for Named Entity Recognition , 2003, CoNLL.

[8] Koby Crammer,et al. Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[9] Ben Taskar,et al. Max-Margin Markov Networks , 2003, NIPS.

[10] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[11] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[12] Thomas Hofmann,et al. Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[13] Andrew McCallum,et al. Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[14] Koby Crammer,et al. On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[15] Thomas Hofmann,et al. Investigating Loss Functions and Optimization Methods for Discriminative Learning of Label Sequences , 2003, EMNLP.

[16] David Chiang,et al. Better k-best Parsing , 2005, IWPT.

[17] Dan Klein,et al. Conditional Structure versus Conditional Estimation in NLP Models , 2002, EMNLP.

[18] Walter Daelemans,et al. MBT : Memory Based Tagger, version 1.0, Reference Guide , 2002 .