AfriWOZ: Corpus for Exploiting Cross-Lingual Transfer for Dialogue Generation in Low-Resource, African Languages

Dialogue generation is an important NLP task fraught with many challenges. The challenges become more daunting for low-resource African languages. To enable the creation of dialogue agents for African languages, we contribute the first high-quality dialogue datasets for 6 African languages: Swahili, Wolof, Hausa, Nigerian Pidgin English, Kinyarwanda & Yorùbá. There are a total of 9,000 turns, each language having 1,500 turns, which we translate from a portion of the English multi-domain MultiWOZ dataset. Subsequently, we benchmark by investigating & analyzing the effectiveness of modelling through transfer learning by utilziing state-of-the-art (SoTA) deep monolingual models: DialoGPT and BlenderBot. We compare the models with a simple seq2seq baseline using perplexity. Besides this, we conduct human evaluation of single-turn conversations by using majority votes and measure inter-annotator agreement (IAA). We find that the hypothesis that deep monolingual models learn some abstractions that generalize across languages holds. We observe human-like conversations, to different degrees, in 5 out of the 6 languages. The language with the most transferable properties is the Nigerian Pidgin English, with a human-likeness score of 78.1%, of which 34.4% are unanimous. We freely provide the datasets and host the model checkpoints/demos on the HuggingFace hub for public access.

[1]  Tosin P. Adewumi,et al.  Småprat: DialoGPT for Natural Language Generation of Swedish Dialogue by Transfer Learning , 2021, NLDL.

[2]  Orhan Firat,et al.  Towards Zero-Label Language Learning , 2021, ArXiv.

[3]  Yutaka Matsuo,et al.  AfroMT: Pretraining Strategies and Reproducible Benchmarks for Translation of 8 African Languages , 2021, EMNLP.

[4]  E. Cambria,et al.  Fusing task-oriented and open-domain dialogues in conversational agents , 2021, AAAI.

[5]  Jason Weston,et al.  Internet-Augmented Dialogue Generation , 2021, ACL.

[6]  Jason Weston,et al.  Beyond Goldfish Memory: Long-Term Open-Domain Conversation , 2021, ACL.

[7]  Christy Dennison,et al.  Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets , 2021, NeurIPS.

[8]  Graham Neubig,et al.  MasakhaNER: Named Entity Recognition for African Languages , 2021, Transactions of the Association for Computational Linguistics.

[9]  Marcus Liwicki,et al.  Understanding the Role of Objectivity in Machine Learning and Research Evaluation , 2021 .

[10]  Diyi Yang,et al.  The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics , 2021, GEM.

[11]  Steven Schockaert,et al.  Don’t Patronize Me! An Annotated Dataset with Patronizing and Condescending Language towards Vulnerable Communities , 2020, COLING.

[12]  Marcus Liwicki,et al.  The Challenge of Diacritics in Yoruba Embeddings , 2020, NeurIPS 2020.

[13]  Colin Raffel,et al.  mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer , 2020, NAACL.

[14]  Hady Elsahar,et al.  Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages , 2020, FINDINGS.

[15]  Iryna Gurevych,et al.  AdapterHub: A Framework for Adapting Transformers , 2020, EMNLP.

[16]  Jianfeng Gao,et al.  DeBERTa: Decoding-enhanced BERT with Disentangled Attention , 2020, ICLR.

[17]  Mary Williamson,et al.  Recipes for Building an Open-Domain Chatbot , 2020, EACL.

[18]  Mary Williamson,et al.  Can You Put it All Together: Evaluating Conversational Agents’ Ability to Blend Skills , 2020, ACL.

[19]  Quoc V. Le,et al.  Towards a Human-like Open-Domain Chatbot , 2020, ArXiv.

[20]  Tapio Salakoski,et al.  Multilingual is not enough: BERT for Finnish , 2019, ArXiv.

[21]  J. Weston,et al.  Queens Are Powerful Too: Mitigating Gender Bias in Dialogue Generation , 2019, EMNLP.

[22]  Jianfeng Gao,et al.  DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation , 2019, ACL.

[23]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[24]  Mikel Artetxe,et al.  On the Cross-lingual Transferability of Monolingual Representations , 2019, ACL.

[25]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[26]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[27]  Tapio Salakoski,et al.  Is Multilingual BERT Fluent in Language Generation? , 2019, ArXiv.

[28]  Jason Weston,et al.  Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack , 2019, EMNLP.

[29]  Erik T. Mueller,et al.  Multi-turn Dialogue Response Generation with Autoregressive Transformer Models , 2019, ArXiv.

[30]  Tosin P. Adewumi,et al.  Conversational Systems in Machine Learning from the Point of View of the Philosophy of Science—Using Alime Chat and Related Studies , 2019, Philosophies.

[31]  Anuj Kumar Goyal,et al.  MultiWOZ 2.1: A Consolidated Multi-Domain Dialogue Dataset with State Corrections and State Tracking Baselines , 2019, LREC.

[32]  Eva Schlinger,et al.  How Multilingual is Multilingual BERT? , 2019, ACL.

[33]  Omer Levy,et al.  SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[34]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[35]  Harry Shum,et al.  The Design and Implementation of XiaoIce, an Empathetic Social Chatbot , 2018, CL.

[36]  Eric P. Xing,et al.  Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation , 2018, ACL.

[37]  Stefan Ultes,et al.  MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling , 2018, EMNLP.

[38]  Jason Weston,et al.  Personalizing Dialogue Agents: I have a dog, do you have pets too? , 2018, ACL.

[39]  Xiaoyu Shen,et al.  DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset , 2017, IJCNLP.

[40]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[41]  Jason Weston,et al.  ParlAI: A Dialog Research Software Platform , 2017, EMNLP.

[42]  K. Gwet Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters , 2014 .

[43]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[44]  Kelechukwu Ihemere,et al.  A basic description and analytic treatment of noun clauses in Nigerian Pidgin , 2006 .

[45]  J. Sim,et al.  The kappa statistic in reliability studies: use, interpretation, and sample size requirements. , 2005, Physical therapy.

[46]  John P. Hutchison,et al.  African Languages: An Introduction , 2000 .

[47]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[48]  Barbara J. Grosz,et al.  Natural-Language Processing , 1982, Artif. Intell..

[49]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[50]  Jimmy J. Lin,et al.  Small Data? No Problem! Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages , 2021, MRL.

[51]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[52]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[53]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[54]  E. Polomé,et al.  Swahili language handbook , 1967 .