论文信息 - From Good to Best: Two-Stage Training for Cross-lingual Machine Reading Comprehension

From Good to Best: Two-Stage Training for Cross-lingual Machine Reading Comprehension

Cross-lingual Machine Reading Comprehension (xMRC) is challenging due to the lack of training data in low-resource languages. The recent approaches use training data only in a resource-rich language like English to fine-tune large-scale cross-lingual pre-trained language models. Due to the big difference between languages, a model fine-tuned only by a source language may not perform well for target languages. Interestingly, we observe that while the top-1 results predicted by the previous approaches may often fail to hit the ground-truth answers, the correct answers are often contained in the top-k predicted results. Based on this observation, we develop a two-stage approach to enhance the model performance. The first stage targets at recall: we design a hard-learning (HL) algorithm to maximize the likelihood that the top-k predictions contain the accurate answer. The second stage focuses on precision: an answer-aware contrastive learning (AA-CL) mechanism is developed to learn the fine difference between the accurate answer and other candidates. Our extensive experiments show that our model significantly outperforms a series of strong baselines on two cross-lingual MRC benchmark datasets.

[1] James S. Duncan,et al. SimCVD: Simple Contrastive Voxel-Wise Representation Distillation for Semi-Supervised Medical Image Segmentation , 2021, ArXiv.

[2] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.

[3] Quoc V. Le,et al. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension , 2018, ICLR.

[4] Yuexian Zou,et al. Adaptive Bi-Directional Attention: Exploring Multi-Granularity Representations for Machine Reading Comprehension , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5] Richard Socher,et al. XLDA: Cross-Lingual Data Augmentation for Natural Language Inference and Question Answering , 2019, ArXiv.

[6] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[7] Lidong Bing,et al. Improving Low-Resource Named Entity Recognition using Joint Sentence and Token Labeling , 2020, ACL.

[8] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[9] Yuexian Zou,et al. Contextualized Attention-based Knowledge Transfer for Spoken Conversational Question Answering , 2021, Interspeech 2021.

[10] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[11] Ali Farhadi,et al. Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.

[12] Chenyu You,et al. Self-supervised Contrastive Cross-Modality Representation Learning for Spoken Question Answering , 2021, EMNLP.

[13] Kaiming He,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Wanxiang Che,et al. Cross-Lingual Machine Reading Comprehension , 2019, EMNLP/IJCNLP.

[15] Ming Zhou,et al. Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks , 2019, EMNLP.

[16] Percy Liang,et al. Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[17] Ming Gong,et al. Cross-lingual Machine Reading Comprehension with Language Branch Knowledge Distillation , 2020, COLING.

[18] Matthijs Douze,et al. Data Augmenting Contrastive Learning of Speech Representations in the Time Domain , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).

[19] Yoshimasa Tsuruoka,et al. Multilingual Extractive Reading Comprehension by Runtime Machine Translation , 2018, ArXiv.

[20] Nan Duan,et al. Enhancing Answer Boundary Detection for Multilingual Machine Reading Comprehension , 2020, ACL.

[21] Yann LeCun,et al. Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22] Wanli Zuo,et al. CalibreNet: Calibration Networks for Multilingual Sequence Labeling , 2020, ArXiv.

[23] Danqi Chen,et al. CoQA: A Conversational Question Answering Challenge , 2018, TACL.

[24] Fan Yang,et al. XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation , 2020, EMNLP.

[25] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[26] Ming Zhou,et al. InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training , 2021, NAACL.

[27] Tianyu Gao,et al. SimCSE: Simple Contrastive Learning of Sentence Embeddings , 2021, EMNLP.

[28] Tao Shen,et al. Improving Zero-Shot Cross-lingual Transfer for Multilingual Question Answering over Knowledge Graph , 2021, NAACL.

[29] Sebastian Riedel,et al. MLQA: Evaluating Cross-lingual Extractive Question Answering , 2019, ACL.

[30] James S. Duncan,et al. Momentum Contrastive Voxel-wise Representation Learning for Semi-supervised Volumetric Medical Image Segmentation , 2021, ArXiv.

[31] Chenyu You,et al. MRD-Net: Multi-Modal Residual Knowledge Distillation for Spoken Question Answering , 2021, IJCAI.

[32] Madian Khabsa,et al. CLEAR: Contrastive Learning for Sentence Representation , 2020, ArXiv.

[33] Danqi Chen,et al. A Discrete Hard EM Approach for Weakly Supervised Question Answering , 2019, EMNLP.