Zipfian corruptions for robust POS tagging

Inspired by robust generalization and adversarial learning we describe a novel approach to learning structured perceptrons for part-ofspeech (POS) tagging that is less sensitive to domain shifts. The objective of our method is to minimize average loss under random distribution shifts. We restrict the possible target distributions to mixtures of the source distribution and random Zipfian distributions. Our algorithm is used for POS tagging and evaluated on the English Web Treebank and the Danish Dependency Treebank with an average 4.4% error reduction in tagging accuracy.

[1]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[2]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[3]  G. Āllport The Psycho-Biology of Language. , 1936 .

[4]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[5]  G. Zipf,et al.  The Psycho-Biology of Language , 1936 .

[6]  M. Trautner,et al.  The Danish Dependency Treebank and the DTAG Treebank Tool , 2003 .

[7]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[8]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[9]  Ohad Shamir,et al.  Learning to classify with missing and corrupted features , 2008, ICML '08.

[10]  Arkadi Nemirovski,et al.  Robust Convex Optimization , 1998, Math. Oper. Res..

[11]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[12]  Christopher D. Manning Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? , 2011, CICLing.

[13]  Andrew McCallum,et al.  Reducing Weight Undertraining in Structured Discriminative Learning , 2006, NAACL.

[14]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[15]  Theodore B. Trafalis,et al.  Robust support vector machines for classification and computational issues , 2007, Optim. Methods Softw..

[16]  Giorgio Satta,et al.  Guided Learning for Bidirectional Sequence Classification , 2007, ACL.

[17]  Yoav Goldberg,et al.  splitSVM: Fast, Space-Efficient, non-Heuristic, Polynomial Kernel Computation for NLP Applications , 2008, ACL.

[18]  Amir Globerson,et al.  Nightmare at test time: robust learning by feature deletion , 2006, ICML.

[19]  Slav Petrov,et al.  Overview of the 2012 Shared Task on Parsing the Web , 2012 .