An overview of Natural Language Inference Data Collection: The way forward?

Understanding a Natural Language (NL) sentence amounts to understanding its consequences as well as the sentences that it is a consequence of. As Cooper et al. (1996) aptly put it ‘inferential ability is not only a central manifestation of semantic competence but is in fact centrally constitutive of it’ . Given the latter claim, it is no surprise that the same authors argue that the best way to test the semantic adequacy of NLP systems is to look at how well they perform with regard to Natural Language Inference (NLI). It is very hard not to agree with this statement. Indeed, NLI is considered by many researchers to be the crux of computational semantics. This paper is about the datasets created for this need. In particular, we discuss the most common NLI resources arguing that all these a) fail to capture the wealth of inferential mechanisms present in NLI and b) seem to be driven by the dominant discourse in the field at the time of their creation. In light of these observations, we want to discuss the requirements that an adequate NLI platform must satisfy both in terms of the range of inference patterns found in reasoning with NL as well as the range of the data collection mechanisms that are needed in order to acquire this range of inferential patterns.