论文信息 - Training a Log-Linear Parser with Loss Functions via Softmax-Margin - 字舞流文

Training a Log-Linear Parser with Loss Functions via Softmax-Margin

Log-linear parsing models are often trained by optimizing likelihood, but we would prefer to optimise for a task-specific metric like F-measure. Softmax-margin is a convex objective for such models that minimises a bound on expected risk for a given loss function, but its naive application requires the loss to decompose over the predicted structure, which is not true of F-measure. We use softmax-margin to optimise a log-linear CCG parser for a variety of loss functions, and demonstrate a novel dynamic programming algorithm that enables us to use it with F-measure, leading to substantial gains in accuracy on CCG-Bank. When we embed our loss-trained parser into a larger model that includes supertagging features incorporated via belief propagation, we obtain further improvements and achieve a labelled/unlabelled dependency F-measure of 89.3%/94.0% on gold part-of-speech tags, and 87.2%/92.8% on automatic part-of-speech tags, the best reported results for this task.

Adam Lopez | Michael Auli | Adam Lopez | Michael Auli

[1] J. Baker. Trainable grammars for speech recognition , 1979 .

[2] Martin Kay,et al. Syntactic Process , 1979, ACL.

[3] Judea Pearl,et al. Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[4] Joshua Goodman,et al. Parsing Algorithms and Metrics , 1996, ACL.

[5] Srinivas Bangalore,et al. Supertagging: An Approach to Almost Parsing , 1999, CL.

[6] David A. McAllester. On the complexity analysis of static analyses , 1999, JACM.

[7] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[8] Daniel Povey,et al. Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9] Mark Steedman,et al. Building Deep Dependency Structures using a Wide-Coverage CCG Parser , 2002, ACL.

[10] Stephen Clark,et al. Supertagging for Combinatory Categorial Grammar , 2002, TAG+.

[11] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[12] Ben Taskar,et al. Max-Margin Parsing , 2004, EMNLP.

[13] James R. Curran,et al. The Importance of Supertagging for Wide-Coverage CCG Parsing , 2004, COLING.

[14] Fernando Pereira,et al. Case-factor diagrams for structured probabilistic modeling , 2004, J. Comput. Syst. Sci..

[15] Lawrence K. Saul,et al. Large Margin Hidden Markov Models for Automatic Speech Recognition , 2006, NIPS.

[16] James R. Curran,et al. Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models , 2007, Computational Linguistics.

[17] Mark Steedman,et al. CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank , 2007, CL.

[18] Christopher D. Manning,et al. Efficient, Feature-based, Conditional Random Field Parsing , 2008, ACL.

[19] Mirella Lapata,et al. Proceedings of ACL-08: HLT , 2008 .

[20] Liang Huang,et al. Forest Reranking: Discriminative Parsing with Non-Local Features , 2008, ACL.

[21] David A. Smith,et al. Dependency Parsing by Belief Propagation , 2008, EMNLP.

[22] Markus Dreyer,et al. Graphical Models over Multiple Strings , 2009, EMNLP.

[23] Gerald Penn,et al. Accurate Context-Free Parsing with Combinatory Categorial Grammar , 2010, ACL.

[24] Noah A. Smith,et al. Softmax-Margin CRFs: Training Log-Linear Models with Cost Functions , 2010, NAACL.

[25] Noah A. Smith,et al. Softmax-Margin Training for Structured Log-Linear Models , 2010 .

[26] Adam Lopez,et al. A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing , 2011, ACL.

[27] Stephen Clark,et al. Evaluating a Wide-Coverage CCG Parser , 2013 .