ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of Semantic Interpretation

Compositional generalization benchmarks seek to assess whether models can accurately compute meanings for novel sentences, but operationalize this in terms of logical form (LF) prediction. This raises the concern that semantically irrelevant details of the chosen LFs could shape model performance. We argue that this concern is realized for the COGS benchmark (Kim and Linzen, 2020). COGS poses generalization splits that appear impossible for present-day models, which could be taken as an indictment of those models. However, we show that the negative results trace to incidental features of COGS LFs. Converting these LFs to semantically equivalent ones and factoring out capabilities unrelated to semantic interpretation, we find that even baseline models get traction. A recent variable-free translation of COGS LFs suggests similar conclusions, but we observe this format is not semantically equivalent; it is incapable of accurately representing some COGS meanings. These findings inform our proposal for ReCOGS, a modified version of COGS that comes closer to assessing the target semantic capabilities while remaining very challenging. Overall, our results reaffirm the importance of compositional generalization and careful benchmark task design.

[1]  P. Smolensky,et al.  Uncontrolled Lexical Exposure Leads to Overestimation of Compositional Generalization in Pretrained Models , 2022, ArXiv.

[2]  Alexander Koller,et al.  Structural generalization is hard for sequence-to-sequence models , 2022, EMNLP.

[3]  Navin Goyal,et al.  When Can Transformers Ground and Compose: Insights from Compositional Generalization Benchmarks , 2022, EMNLP.

[4]  Xinyun Chen,et al.  Compositional Semantic Parsing with Large Language Models , 2022, ArXiv.

[5]  P. Blunsom,et al.  Revisiting the Compositional Generalization Abilities of Neural Sequence Models , 2022, ACL.

[6]  Pawel Krzysztof Nowak,et al.  Improving Compositional Generalization with Latent Structure and Data Augmentation , 2021, NAACL.

[7]  Timothy J. O'Donnell,et al.  Systematic Generalization with Edge Transformers , 2021, NeurIPS.

[8]  Mirella Lapata,et al.  Disentangled Sequence to Sequence Learning for Compositional Generalization , 2021, ACL.

[9]  Christopher Potts,et al.  ReaSCAN: Compositional Reasoning in Language Grounding , 2021, NeurIPS Datasets and Benchmarks.

[10]  J. Schmidhuber,et al.  The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers , 2021, EMNLP.

[11]  J. Ainslie,et al.  Making Transformers Solve Compositional Tasks , 2021, ACL.

[12]  Kenny Smith,et al.  Meta-Learning to Compositionally Generalize , 2021, ACL.

[13]  Jacob Andreas,et al.  Lexicon Learning for Few Shot Sequence Modeling , 2021, ACL.

[14]  Michael Johnson Compositionality , 2020, The Wiley Blackwell Companion to Semantics.

[15]  Mirella Lapata,et al.  Compositional Generalization via Semantic Tagging , 2020, EMNLP.

[16]  Tal Linzen,et al.  COGS: A Compositional Generalization Challenge Based on Semantic Interpretation , 2020, EMNLP.

[17]  Jonathan Berant,et al.  Improving Compositional Generalization in Semantic Parsing , 2020, FINDINGS.

[18]  Jonathan Berant,et al.  Span-based Semantic Parsing for Compositional Generalization , 2020, ACL.

[19]  B. Lake,et al.  A Benchmark for Systematic Generalization in Grounded Language Understanding , 2020, NeurIPS.

[20]  Xiao Wang,et al.  Measuring Compositional Generalization: A Comprehensive Method on Realistic Data , 2019, ICLR.

[21]  Christopher Potts,et al.  Posing Fair Generalization Tasks for Natural Language Inference , 2019, EMNLP.

[22]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[23]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[24]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[25]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[26]  Mark Steedman,et al.  Universal Semantic Parsing , 2017, EMNLP.

[27]  Irene Heim,et al.  File Change Semantics and the Familiarity Theory of Definiteness , 2008 .

[28]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[29]  H. B. Curry,et al.  Combinatory Logic, Volume I. , 1961 .

[30]  Alexander Koller,et al.  Compositional generalization with a broad-coverage semantic parser , 2022, STARSEM.

[31]  Milton Stephen Seegmiller,et al.  Lexical insertion in a transformational grammar , 1983 .

[32]  Irene Heim,et al.  The semantics of definite and indefinite noun phrases : a dissertation , 1982 .

[33]  R. Montague Formal philosophy; selected papers of Richard Montague , 1974 .