Natural Language Inference for Portuguese Using BERT and Multilingual Information

Recognizing Textual Entailment, also known as inference recognition, aims to identify when the meaning of a piece of text contains the meaning of another fragment of text. In this work, we investigate multiples approaches for recognizing inference in the ASSIN dataset, an entailment recognition corpus for Portuguese. We also investigate the consequences of adding external data to improve training in two different forms: multilingual data and automatically translated corpus. Our results outperform, using the multilingual pre-trained BERT model, the current state-of-the-art for the ASSIN corpus. Finally, we show that using external data did not improve the performance of the model or the improvements are not significant.

[1]  Luciano Barbosa,et al.  Blue Man Group no ASSIN: Usando Representações Distribuídas para Similaridade Semântica e Inferência Textual , 2016, Linguamática.

[2]  Stefan Thater,et al.  Assessing the impact of frame semantics on textual entailment , 2009, Natural Language Engineering.

[3]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Henrique Lopes Cardoso,et al.  Recognizing Textual Entailment: Challenges in the Portuguese Language , 2018, Inf..

[5]  Luísa Coheur,et al.  INESC-ID@ASSIN: Medição de Similaridade Semântica e Reconhecimento de Inferência Textual , 2016, Linguamática.

[6]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[7]  Guillaume Lample,et al.  XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.

[8]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[9]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[10]  Sandra M. Aluísio,et al.  Visão Geral da Avaliação de Similaridade Semântica e Inferência Textual , 2016, Linguamática.

[11]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[12]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[13]  Ido Dagan,et al.  Recognizing Textual Entailment: Models and Applications , 2013, Recognizing Textual Entailment: Models and Applications.

[14]  Hugo Gonçalo Oliveira,et al.  ASAPP: Alinhamento Semântico Automático de Palavras aplicado ao Português , 2016, Linguamática.

[15]  Sandra M. Aluísio,et al.  Syntactic Knowledge for Natural Language Inference in Portuguese , 2018, PROPOR.

[16]  Richard Socher,et al.  XLDA: Cross-Lingual Data Augmentation for Natural Language Inference and Question Answering , 2019, ArXiv.