论文信息 - Task-Adaptive Pre-Training for Boosting Learning With Noisy Labels: A Study on Text Classification for African Languages

Task-Adaptive Pre-Training for Boosting Learning With Noisy Labels: A Study on Text Classification for African Languages

For high-resource languages like English, text classification is a well-studied task. The performance of modern NLP models easily achieves an accuracy of more than 90% in many standard datasets for text classification in English (Xie et al., 2019; Yang et al., 2019; Zaheer et al., 2020). However, text classification in low-resource languages is still challenging due to the lack of annotated data. Although methods like weak supervision and crowdsourcing can help ease the annotation bottleneck, the annotations obtained by these methods contain label noise. Models trained with label noise may not generalize well. To this end, a variety of noise-handling techniques have been proposed to alleviate the negative impact caused by the errors in the annotations (for extensive surveys see (Hedderich et al., 2021; Algan&Ulusoy, 2021)). In this work, we experiment with a group of standard noisy-handling methods on text classification tasks with noisy labels. We study both simulated noise and realistic noise induced by weak supervision. Moreover, we find task-adaptive pre-training techniques (Gururangan et al., 2020) are beneficial for learning with noisy labels.

David Ifeoluwa Adelani | Michael A. Hedderich | D. Klakow | D. Zhu | Fangzhou Zhai

[1] Stephen R. Dager,et al. Association of Sex With Neurobehavioral Markers of Executive Function in 2-Year-Olds at High and Low Likelihood of Autism , 2023, JAMA network open.

[2] Heike Adel,et al. A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios , 2020, NAACL.

[3] Dietrich Klakow,et al. Transfer Learning and Distant Supervision for Multilingual Transformer Models: A Study on African Languages , 2020, EMNLP.

[4] M. Zaheer,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.

[5] Gang Niu,et al. Dual T: Reducing Estimation Error for Transition Matrix in Label-noise Learning , 2020, NeurIPS.

[6] Doug Downey,et al. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.

[7] Aditya Krishna Menon,et al. Does label smoothing mitigate label noise? , 2020, ICML.

[8] G. Algan,et al. Image Classification with Deep Learning in the Presence of Noisy Labels: A Survey , 2019, Knowl. Based Syst..

[9] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[10] Xuanjing Huang,et al. How to Fine-Tune BERT for Text Classification? , 2019, CCL.

[11] Quoc V. Le,et al. Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.

[12] Daniel Pressel,et al. An Effective Label Noise Model for DNN Text Classification , 2019, NAACL.

[13] Masashi Sugiyama,et al. Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[14] Kevin Gimpel,et al. Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise , 2018, NeurIPS.

[15] Richard Nock,et al. Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Jacob Goldberger,et al. Training deep neural-networks based on unreliable labels , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[19] Aditya Krishna Menon,et al. Learning with Symmetric Label Noise: The Importance of Being Unhinged , 2015, NIPS.

[20] Dumitru Erhan,et al. Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[21] Joan Bruna,et al. Training Convolutional Networks with Noisy Labels , 2014, ICLR 2014.

[22] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[23] Sebastian Ruder,et al. BERT memorisation and pitfalls in low-resource scenarios , 2021, ArXiv.