LINDA: Unsupervised Learning to Interpolate in Natural Language Processing

Despite the success of mixup in data augmentation, its applicability to natural language processing (NLP) tasks has been limited due to the discrete and variable-length nature of natural languages. Recent studies have thus relied on domain-specific heuristics and manually crafted resources, such as dictionaries, in order to apply mixup in NLP. In this paper, we instead propose an unsupervised learning approach to text interpolation for the purpose of data augmentation, to which we refer as “Learning to INterpolate for Data Augmentation” (LINDA), that does not require any heuristics nor manually crafted resources but learns to interpolate between any pair of natural language sentences over a natural language manifold. After empirically demonstrating the LINDA’s interpolation capability, we show that LINDA indeed allows us to seamlessly apply mixup in NLP and leads to better generalization in text classification both indomain and out-of-domain.

[1]  Ioannis Mitliagkas,et al.  Manifold Mixup: Better Representations by Interpolating Hidden States , 2018, ICML.

[2]  Hongyu Guo,et al.  Augmenting Data with Mixup for Sentence Classification: An Empirical Study , 2019, ArXiv.

[3]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[4]  Ateret Anaby-Tavor,et al.  Do Not Have Enough Data? Deep Learning to the Rescue! , 2020, AAAI.

[5]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[6]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[7]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[8]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[9]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[10]  Taesup Kim,et al.  Fast AutoAugment , 2019, NeurIPS.

[11]  Xing Wu,et al.  Conditional BERT Contextual Augmentation , 2018, ICCS.

[12]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[13]  Amit Kumar,et al.  Transformer-based Neural Machine Translation System for Hindi – Marathi: WMT20 Shared Task , 2020, WMT.

[14]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[15]  Diyi Yang,et al.  MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification , 2020, ACL.

[16]  Quoc V. Le,et al.  Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.

[17]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[18]  Kyunghyun Cho,et al.  Retrieval-Augmented Convolutional Neural Networks Against Adversarial Examples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Dawn Song,et al.  Pretrained Transformers Improve Out-of-Distribution Robustness , 2020, ACL.

[20]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[21]  Qun Liu,et al.  Reweighting Augmented Samples by Minimizing the Maximal Expected Loss , 2021, ICLR.

[22]  Kai Zou,et al.  EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks , 2019, EMNLP.

[23]  Jeff Heaton,et al.  Ian Goodfellow, Yoshua Bengio, and Aaron Courville: Deep learning , 2017, Genetic Programming and Evolvable Machines.

[24]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[25]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[26]  Jianmo Ni,et al.  Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects , 2019, EMNLP.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Quoc V. Le,et al.  RandAugment: Practical data augmentation with no separate search , 2019, ArXiv.

[29]  Eunah Cho,et al.  Data Augmentation using Pre-trained Transformer Models , 2020, LIFELONGNLP.

[30]  Kyunghyun Cho,et al.  Latent-Variable Non-Autoregressive Neural Machine Translation with Deterministic Inference using a Delta Posterior , 2019, AAAI.

[31]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[32]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[33]  Phil Blunsom,et al.  Reasoning about Entailment with Neural Attention , 2015, ICLR.

[34]  Kyunghyun Cho,et al.  SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving Out-of-Domain Robustness , 2020, EMNLP.

[35]  Soyoung Yoon,et al.  SSMix: Saliency-Based Span Mixup for Text Classification , 2021, FINDINGS.

[36]  Myle Ott,et al.  Understanding Back-Translation at Scale , 2018, EMNLP.

[37]  Philip S. Yu,et al.  Mixup-Transfomer: Dynamic Data Augmentation for NLP Tasks , 2020, ArXiv.