SILT: Efficient transformer training for inter-lingual inference

The ability of transformers to perform precision tasks such as question answering, Natural Language Inference (NLI) or summarising, have enabled them to be ranked as one of the best paradigm to address Natural Language Processing (NLP) tasks. NLI is one of the best scenarios to test these architectures, due to the knowledge required to understand complex sentences and established relationships between a hypothesis and a premise. Nevertheless, these models suffer from incapacity to generalise to other domains or difficulties to face multilingual and interlingual scenarios. The leading pathway in the literature to address these issues involve designing and training extremely large architectures, which leads to unpredictable behaviours and to establish barriers which impede broad access and fine tuning. In this paper, we propose a new architecture called Siamese Inter-Lingual Transformer (SILT), to efficiently align multilingual embeddings for Natural Language Inference, allowing for unmatched language pairs to be processed. SILT leverages siamese pre-trained multilingual transformers with frozen weights where the two input sentences attend each other to later be combined through a matrix alignment method. The experimental results carried out in this paper evidence that SILT allows to reduce drastically the number of trainable parameters while allowing for inter-lingual NLI and achieving state-of-the-art performance on common benchmarks. We make our code and dataset available at https://github.com/jahuerta92/siamese-inter-lingualtransformer .

[1]  W. Bruce Croft,et al.  BERT with History Answer Embedding for Conversational Question Answering , 2019, SIGIR.

[2]  Marco Marelli,et al.  A SICK cure for the evaluation of compositional distributional semantic models , 2014, LREC.

[3]  Zhe Gan,et al.  FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding , 2021, AAAI.

[4]  Xiaodong Liu,et al.  Stochastic Answer Networks for Natural Language Inference , 2018, ArXiv.

[5]  Rachel Rudinger,et al.  Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation , 2018, BlackboxNLP@EMNLP.

[6]  Haohan Wang,et al.  Unlearn Dataset Bias in Natural Language Inference by Fitting the Residual , 2019, EMNLP.

[7]  Sawan Kumar,et al.  NILE : Natural Language Inference with Faithful Natural Language Explanations , 2020, ACL.

[8]  Jiahai Wang,et al.  Utilizing BERT Intermediate Layers for Aspect Based Sentiment Analysis and Natural Language Inference , 2020, ArXiv.

[9]  Mirella Lapata,et al.  Text Summarization with Pretrained Encoders , 2019, EMNLP.

[10]  Guillaume Lample,et al.  XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.

[11]  Vladimir Araujo,et al.  Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks , 2020, LREC.

[12]  Ramakanth Pasunuru,et al.  Multi-Reward Reinforced Summarization with Saliency and Entailment , 2018, NAACL.

[13]  Armen Aghajanyan,et al.  Better Fine-Tuning by Reducing Representational Collapse , 2020, ICLR.

[14]  Thomas Lukasiewicz,et al.  e-SNLI: Natural Language Inference with Natural Language Explanations , 2018, NeurIPS.

[15]  Leslie N. Smith,et al.  Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  Niranjan Balasubramanian,et al.  Repurposing Entailment for Multi-Hop Question Answering Tasks , 2019, NAACL.

[18]  Shrey Desai,et al.  Calibration of Pre-trained Transformers , 2020, EMNLP.

[19]  Jakob Uszkoreit,et al.  A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.

[20]  Mamoru Komachi,et al.  RUSE: Regressor Using Sentence Embeddings for Automatic Machine Translation Evaluation , 2018, WMT.

[21]  Siu Cheung Hui,et al.  Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference , 2017, EMNLP.

[22]  Ming Zhou,et al.  Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks , 2019, EMNLP.

[23]  Francesco Marcelloni,et al.  UNIPI-NLE at CheckThat! 2020: Approaching Fact Checking from a Sentence Similarity Perspective Through the Lens of Transformers , 2020, CLEF.

[24]  Christopher Joseph Pal,et al.  Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning , 2018, ICLR.

[25]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[26]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[27]  Zhen-Hua Ling,et al.  Neural Natural Language Inference Models Enhanced with External Knowledge , 2017, ACL.

[28]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[29]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[30]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[31]  Jason Weston,et al.  Dialogue Natural Language Inference , 2018, ACL.

[32]  Zhen-Hua Ling,et al.  Enhanced LSTM for Natural Language Inference , 2016, ACL.

[33]  Alexey Romanov,et al.  Lessons from Natural Language Inference in the Clinical Domain , 2018, EMNLP.

[34]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[35]  Yonatan Belinkov,et al.  Don’t Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference , 2019, ACL.

[36]  Xiaoyu Yang,et al.  Enhancing Unsupervised Pretraining with External Knowledge for Natural Language Inference , 2019, Canadian Conference on AI.

[37]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[38]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[39]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[40]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[41]  Lawrence S. Moss,et al.  OCNLI: Original Chinese Natural Language Inference , 2020, FINDINGS.

[42]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[43]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.