Mirror Distillation Model with Focal Loss for Chinese Machine Reading Comprehension

Machine comprehension of text is a critical problem in natural language processing (NLP). However, there are severe data imbalance problems in extractive machine reading comprehension (MRC) tasks: unbalanced data will lead to differences between training and testing. Without a balanced label, the learning process tends to meet to a point, strongly biasing towards the class and the majority of labels. The most used cross-entropy criteria are accuracy-oriented, mainly focusing on the majority class to achieve high accuracy. Another problem is that the training of the model only introduces correct knowledge, which is not accurate enough for answer prediction. To address these issues, we propose to use focal loss in replacement of the standard cross-entropy objective for data imbalanced MRC tasks, which uses weight coefficients to adjust the weight of positive and negative samples and distinguish between easy and difficult samples. We propose using the same model to perform mirror distillation so that the teacher model can provide the correct and wrong knowledge for the student model to learn, obtain better model generalization ability, and improve the accuracy of answer prediction. Our results show that our mirror distillation method significantly improves the performance of the model. Moreover, our results also prove that the focal loss effectively solves the problem of data imbalance in the task.