Learning to Rank for Plausible Plausibility

Researchers illustrate improvements in contextual encoding strategies via resultant performance on a battery of shared Natural Language Understanding (NLU) tasks. Many of these tasks are of a categorical prediction variety: given a conditioning context (e.g., an NLI premise), provide a label based on an associated prompt (e.g., an NLI hypothesis). The categorical nature of these tasks has led to common use of a cross entropy log-loss objective during training. We suggest this loss is intuitively wrong when applied to plausibility tasks, where the prompt by design is neither categorically entailed nor contradictory given the context. Log-loss naturally drives models to assign scores near 0.0 or 1.0, in contrast to our proposed use of a margin-based loss. Following a discussion of our intuition, we describe a confirmation study based on an extreme, synthetically curated task derived from MultiNLI. We find that a margin-based loss leads to a more plausible model of plausibility. Finally, we illustrate improvements on the Choice Of Plausible Alternative (COPA) task through this change in loss.

[1]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[2]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[5]  Zornitsa Kozareva,et al.  SemEval-2012 Task 7: Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning , 2011, *SEMEVAL.

[6]  Naoaki Okazaki,et al.  Handling Multiword Expressions in Causality Estimation , 2017, IWCS.

[7]  Samy Bengio,et al.  Large Scale Online Learning of Image Similarity Through Ranking , 2009, J. Mach. Learn. Res..

[8]  Seung-won Hwang,et al.  Commonsense Causal Reasoning between Short Texts , 2016, KR.

[9]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[10]  Sheng Zhang,et al.  Ordinal Common-sense Inference , 2016, TACL.

[11]  Cosmin Adrian Bejan,et al.  Commonsense Causal Reasoning Using Millions of Personal Stories , 2011, AAAI.

[12]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[13]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14]  Xiaoying Gao,et al.  Using Asymmetric Associations for Commonsense Causality Detection , 2014, PRICAI.

[15]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[16]  Ting Liu,et al.  Constructing Narrative Event Evolutionary Graph for Script Event Prediction , 2018, IJCAI.

[17]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[18]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[19]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.