An Approach to Extractive Bangla Question Answering Based On BERT-Bangla And BQuAD
暂无分享,去创建一个
A language model trained on a well-organized Bangla dataset can play a significant role towards perfecting informative question answering systems in Bangla by addressing several issues faced in the development of such systems in resource limited languages. Bidirectional Encoder Representations from Transformers (BERT) is a truly bidirectional language model introduced by Google which achieves state-of-the-art performance in natural language understanding tasks. In this paper, we introduce BERT-Bangla, a language model pre-trained on a large amount of Bangla unlabeled text. We evaluate BERT-Bangla on several Bangla NLP classification tasks and achieve better performances than other existing Bangla language models. In response to the need for a well-suited Bangla question-answering dataset, we develop BQuAD (Bangla Question Answering Dataset) comprising question-answer pairs and contexts featuring various topics from multiple domains. We fine-tune BERT-Bangla on BQuAD, resulting in an extractive question-answering model that can process user-provided Bangla contexts to produce specific answer to questions.