NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Pythonbased natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117 transformations and 23 filters for a variety of natural language tasks. We demonstrate the efficacy of NL-Augmenter by using several of its tranformations to analyze the robustness of popular natural language models. The infrastructure, datacards and robutstness analysis results are available publicly on the NL-Augmenter repository (https://github. com/GEM-benchmark/NL-Augmenter).

Ondrej Dusek | Eduard Hovy | Niklas Muennighoff | Wanxiang Che | Gerard de Melo | Sebastian Gehrmann | Caroline Brun | Mayukh Das | Nafise Sadat Moosavi | Marco Antonio Sobrevilla Cabezudo | Andrey Lukyanenko | Vukosi Marivate | Thomas Scialom | Damien Sileo | Venelin Kovatchev | Simon Mille | Stefan Langer | Denis Kleyko | Kaizhao Liang | Yue Zhang | Simon Meoni | Saad Mahamood | Pierre Colombo | Emile Chapuis | Christian Clauss | T MukundVarma | Antoine Honore | Kalpesh Krishna | Marco Di Giovanni | Robin M. Schmidt | Paul-Alexis Dray | Shahab Raji | Gautier Dagan | Ananya B. Sai | Tatiana Ekeinhor | Jan Pfister | Anna Shvets | Usama Yaseen | Kaustubh D. Dhole | Fabrice Harel-Canada | Tshephisho Sefara | Aman Srivastava | Ian Berlot-Attwell | Marie Tolkiehn | Nivranshu Pasricha | Witold Wydma'nski | Maria Obedkova | Genta Indra Winata | Jing Zhang | Samson Tan | Jinho D. Choi | Ashutosh Kumar | Varun Gangal | Aadesh Gupta | Zhenhao Li | Abinaya Mahendiran | Ashish Srivastava | Tongshuang Wu | Jascha Sohl-Dickstein | Sebastian Ruder | Sajant Anand | Nagender Aneja | Rabin Banjade | Lisa Barthe | Hanna Behnke | Connor Boyle | Samuel Cahyawijaya | Mukund Choudhary | Filip Cornell | Tanay Dixit | Thomas Dopierre | Suchitra Dubey | Rishabh Gupta | Louanes Hamla | Sang Han | Ishan Jindal | Przemyslaw K. Joniak | Seungjae Ryan Lee | Corey James Levinson | Hualou Liang | Zhexiong Liu | Maxime Meyer | Afnan Mir | Timothy Sum Hon Mun | Kenton Murray | Marcin Namysl | Priti Oli | Richard Plant | Vinay Prabhu | Vasile Pais | Libo Qin | Pawan Kumar Rajpoot | Vikas Raunak | Roy Rinberg | Nicolas Roberts | Juan Diego Rodriguez | Claude Roux | S. VasconcellosP.H. | Saqib N. Shamsi | Xudong Shen | Haoyue Shi | Yiwen Shi | Nick Siegel | Jamie Simon | Chandan Singh | Roman Sitelew | Priyank Soni | Taylor Sorensen | William Soto | KV Aditya Srivatsa | Tony Sun | A Tabassum | Fiona Anting Tan | Ryan Teehan | Mo Tiwari | Athena Wang | Zijian Wang | Gloria Wang | Zijie J. Wang | Fuxuan Wei | Bryan Wilie | Xinyi Wu | Tianbao Xie | M. Yee