Mini But Mighty: Efficient Multilingual Pretraining with Linguistically-Informed Data Selection
暂无分享,去创建一个
[1] David Ifeoluwa Adelani,et al. yosm: A new yoruba sentiment corpus for movie reviews , 2022, ArXiv.
[2] Vukosi Marivate,et al. Umsuka English - isiZulu Parallel Corpus , 2021 .
[3] Ankur Bapna,et al. Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets , 2021, TACL.
[4] David Ifeoluwa Adelani,et al. The Effect of Domain and Diacritics in Yoruba–English Neural Machine Translation , 2021, MTSUMMIT.
[5] Antonios Anastasopoulos,et al. BembaSpeech: A Speech Recognition Corpus for the Bemba Language , 2021, LREC.
[6] A. Öktem,et al. Gamayun - Language Technology for Humanitarian Response , 2020, 2020 IEEE Global Humanitarian Technology Conference (GHTC).
[7] Hong Qu,et al. KINNEWS and KIRNEWS: Benchmarking Cross-Lingual Text Classification for Kinyarwanda and Kirundi , 2020, COLING.
[8] Colin Raffel,et al. mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer , 2020, NAACL.
[9] Dietrich Klakow,et al. Transfer Learning and Distant Supervision for Multilingual Transformer Models: A Study on African Languages , 2020, EMNLP.
[10] Bonaventure F. P. Dossou,et al. FFR v1.1: Fon-French Neural Machine Translation , 2020, WINLP.
[11] Paul Rayson,et al. Igbo-English Machine Translation: An Evaluation Benchmark , 2020, ArXiv.
[12] Myle Ott,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.
[13] P. A. Owolawi,et al. Part of Speech Tagging for Setswana African Language , 2019, 2019 International Multidisciplinary Information Technology and Engineering Conference (IMITEC).
[14] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[15] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.
[16] Leland McInnes,et al. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.
[17] Patrick Littell,et al. URIEL and lang2vec: Representing languages as typological, geographical, and phylogenetic vectors , 2017, EACL.
[18] Nkosikhona Dlamini,et al. Part-of-Speech Tagging and Chunking in Text-to-Speech Synthesis for South African Languages , 2016, INTERSPEECH.
[19] Roald Eiselen,et al. Government Domain Named Entity Recognition for South African Languages , 2016, LREC.
[20] Ikechukwu E. Onyenwe,et al. Part-of-speech Tagset and Corpus Development for Igbo, an African Language , 2014, LAW@COLING.
[21] Jörg Tiedemann,et al. Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.
[22] Saadat M. Alhashmi,et al. Sentiment analysis amidst ambiguities in youtube comments on yoruba language (nollywood) movies , 2012, WWW.
[23] Gilles-Maurice de Schryver,et al. Data-Driven Part-of-Speech Tagging of Kiswahili , 2006, TSD.
[24] J. Greenberg,et al. Studies in African linguistic classification , 1957 .
[25] Jimmy J. Lin,et al. Small Data? No Problem! Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages , 2021, MRL.
[26] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[27] John O. R. Aoga,et al. Part-of-Speech tagging of Yoruba Standard, Language of Niger-Congo family , 2013 .