The Structured Weighted Violations Perceptron Algorithm

We present the Structured Weighted Violations Perceptron (SWVP) algorithm, a new structured prediction algorithm that generalizes the Collins Structured Perceptron (CSP). Unlike CSP, the update rule of SWVP explicitly exploits the internal structure of the predicted labels. We prove the convergence of SWVP for linearly separable training sets, provide mistake and generalization bounds, and show that in the general case these bounds are tighter than those of the CSP special case. In synthetic data experiments with data drawn from an HMM, various variants of SWVP substantially outperform its CSP special case. SWVP also provides encouraging initial dependency parsing results.

[1]  Yang Guo,et al.  Structured Perceptron with Inexact Search , 2012, NAACL.

[2]  Alexander M. Rush,et al.  On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing , 2010, EMNLP.

[3]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[4]  Fernando Pereira,et al.  Structured Learning with Approximate Inference , 2007, NIPS.

[5]  Fernando Pereira,et al.  Online Learning of Approximate Dependency Parsing Algorithms , 2006, EACL.

[6]  Derek Hoiem,et al.  Recovering the spatial layout of cluttered rooms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Marc Pollefeys,et al.  Efficient structured prediction for 3D indoor scene understanding , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Alex Acero,et al.  Hidden conditional random fields for phone classification , 2005, INTERSPEECH.

[9]  Luke S. Zettlemoyer,et al.  Online Learning of Relaxed CCG Grammars for Parsing to Logical Form , 2007, EMNLP.

[10]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[12]  Alex Bateman,et al.  An introduction to hidden Markov models. , 2007, Current protocols in bioinformatics.

[13]  Noah A. Smith,et al.  Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers , 2013, ACL.

[14]  Tommi S. Jaakkola,et al.  Learning Efficiently with Approximate Inference via Dual Losses , 2010, ICML.

[15]  Tamir Hazan,et al.  A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction , 2010, NIPS.

[16]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[17]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[18]  Luke S. Zettlemoyer,et al.  Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations , 2011, ACL.

[19]  Michael Collins,et al.  Efficient Third-Order Dependency Parsers , 2010, ACL.

[20]  Yair Weiss,et al.  Minimizing and Learning Energy Functions for Side-Chain Prediction , 2008, J. Comput. Biol..

[21]  Daniel Marcu,et al.  Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.

[22]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[23]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[24]  Noah A. Smith,et al.  Structured Ramp Loss Minimization for Machine Translation , 2012, HLT-NAACL.

[25]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[26]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[27]  Yoram Singer,et al.  A Large Margin Algorithm for Speech-to-Phoneme and Music-to-Score Alignment , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[29]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[30]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[31]  Koby Crammer,et al.  Ultraconservative Online Algorithms for Multiclass Problems , 2001, J. Mach. Learn. Res..

[32]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[33]  Tommi S. Jaakkola,et al.  Tightening LP Relaxations for MAP using Message Passing , 2008, UAI.

[34]  Stephen Clark,et al.  Joint Word Segmentation and POS Tagging Using a Single Perceptron , 2008, ACL.

[35]  Regina Barzilay,et al.  Multi-Event Extraction Guided by Global Constraints , 2012, NAACL.

[36]  Yoav Goldberg,et al.  An Efficient Algorithm for Easy-First Non-Directional Dependency Parsing , 2010, NAACL.