A Challenge Set and Methods for Noun-Verb Ambiguity

English part-of-speech taggers regularly make egregious errors related to noun-verb ambiguity, despite having achieved 97%+ accuracy on the WSJ Penn Treebank since 2002. These mistakes have been difficult to quantify and make taggers less useful to downstream tasks such as translation and text-to-speech synthesis. This paper creates a new dataset of over 30,000 naturally-occurring non-trivial examples of noun-verb ambiguity. Taggers within 1% of each other when measured on the WSJ have accuracies ranging from 57% to 75% accuracy on this challenge set. Enhancing the strongest existing tagger with contextual word embeddings and targeted training data improves its accuracy to 89%, a 14% absolute (52% relative) improvement. Downstream, using just this enhanced tagger yields a 28% reduction in error over the prior best learned model for homograph disambiguation for textto-speech synthesis.

[1]  Dirk Hovy,et al.  Experiments with crowdsourced re-annotation of a POS tagging data set , 2014, ACL.

[2]  Noah A. Smith,et al.  Frame-Semantic Parsing with Softmax-Margin Segmental RNNs and a Syntactic Scaffold , 2017, ArXiv.

[3]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[4]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[5]  Krishnakumar Balasubramanian,et al.  Unsupervised Supervised Learning I: Estimating Classification and Regression Errors without Labels , 2010, J. Mach. Learn. Res..

[6]  Eliyahu Kiperwasser,et al.  Scheduled Multi-Task Learning: From Syntax to Translation , 2018, TACL.

[7]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[8]  Jan Niehues,et al.  Exploiting Linguistic Resources for Neural Machine Translation Using Multi-task Learning , 2017, WMT.

[9]  David Yarowsky,et al.  A corpus-based synthesizer , 1992, ICSLP.

[10]  Yejin Choi,et al.  Mise en Place: Unsupervised Interpretation of Instructional Recipes , 2015, EMNLP.

[11]  Yoshimasa Tsuruoka,et al.  Learning to Parse and Translate Improves Neural Machine Translation , 2017, ACL.

[12]  Luke S. Zettlemoyer,et al.  Human-in-the-Loop Parsing , 2016, EMNLP.

[13]  Timothy Dozat,et al.  Stanford’s Graph-based Neural Dependency Parser at the CoNLL 2017 Shared Task , 2017, CoNLL.

[14]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[15]  Jinho D. Choi Dynamic Feature Induction: The Last Gist to the State-of-the-Art , 2016, NAACL.

[16]  Christopher D. Manning Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? , 2011, CICLing.

[17]  Nizar Habash,et al.  CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies , 2017, CoNLL.

[18]  Kyle Gorman,et al.  Improving homograph disambiguation with supervised machine learning , 2018, LREC.

[19]  Gonçalo Simões,et al.  Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings , 2018, ACL.

[20]  Udo Hahn,et al.  Semi-Supervised Active Learning for Sequence Labeling , 2009, ACL.

[21]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[22]  Dan Roth,et al.  Margin-based active learning for structured predictions , 2010, Int. J. Mach. Learn. Cybern..

[23]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[24]  Rico Sennrich,et al.  Linguistic Input Features Improve Neural Machine Translation , 2016, WMT.

[25]  Jacob Andreas,et al.  Corpus Creation for New Genres: A Crowdsourced Approach to PP Attachment , 2010, Mturk@HLT-NAACL.

[26]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[27]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[28]  Percy Liang,et al.  Unsupervised Risk Estimation Using Only Conditional Independence Structure , 2016, NIPS.