A Machine Learning Benchmark with Meaning: Learnability and Verb Semantics

Just over thirty years ago the prospect of modelling human knowledge with parallel distributed processing systems without explicit rules, became a possibility. In the past five years we have seen remarkable progress with artificial neural network (ANN) based systems being able to solve previously difficult problems in many cognitive domains. With a focus on Natural Language Processing (NLP), we argue that the progress is in part illusory because the benchmarks that measure progress have become task oriented, and have lost sight of the goal to model knowledge. Task oriented benchmarks are not informative about the reasons machine learning succeeds, or fails. We propose a new dataset in which the correct answers to entailments and grammaticality judgements depend crucially on specific items of knowledge about verb semantics, and therefore errors on performance can be directly traced to deficiencies in knowledge. If this knowledge is not learnable from the provided input, then it must be provided as an innate prior.

[1]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[2]  Samuel R. Bowman,et al.  Neural Network Acceptability Judgments , 2018, Transactions of the Association for Computational Linguistics.

[3]  S. Pinker Whatever Happened to the Past Tense Debate , 2006 .

[4]  Kevin Duh,et al.  Inference is Everything: Recasting Semantic Resources into a Unified Evaluation Framework , 2017, IJCNLP.

[5]  S. Pinker Learnability and Cognition: The Acquisition of Argument Structure , 1989 .

[6]  R. Chaffin,et al.  Cognitive and Psychometric Analysis of Analogical Problem Solving , 1990 .

[7]  S. Pinker The Stuff of Thought: Language as a Window into Human Nature , 2007 .

[8]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[9]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[10]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[11]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[12]  J. Berko The Child's Learning of English Morphology , 1958 .

[13]  Shalom Lappin,et al.  Linguistic Nativism and the Poverty of the Stimulus , 2011 .

[14]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[15]  Saif Mohammad,et al.  SemEval-2012 Task 2: Measuring Degrees of Relational Similarity , 2012, *SEMEVAL.

[16]  Peter D. Turney Similarity of Semantic Relations , 2006, CL.

[17]  Omer Levy,et al.  SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[18]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[19]  Michelle G. Gibbons Attaining landmark status: Rumelhart and McClelland's PDP Volumes and the Connectionist Paradigm. , 2019, Journal of the history of the behavioral sciences.

[20]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[21]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[22]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[23]  R. Thomas McCoy,et al.  Non-entailed subsequences as a challenge for natural language inference , 2018, ArXiv.