From Good to Best: Two-Stage Training for Cross-lingual Machine Reading Comprehension

Cross-lingual Machine Reading Comprehension (xMRC) is challenging due to the lack of training data in low-resource languages. The recent approaches use training data only in a resource-rich language like English to fine-tune large-scale cross-lingual pre-trained language models. Due to the big difference between languages, a model fine-tuned only by a source language may not perform well for target languages. Interestingly, we observe that while the top-1 results predicted by the previous approaches may often fail to hit the ground-truth answers, the correct answers are often contained in the top-k predicted results. Based on this observation, we develop a two-stage approach to enhance the model performance. The first stage targets at recall: we design a hard-learning (HL) algorithm to maximize the likelihood that the top-k predictions contain the accurate answer. The second stage focuses on precision: an answer-aware contrastive learning (AA-CL) mechanism is developed to learn the fine difference between the accurate answer and other candidates. Our extensive experiments show that our model significantly outperforms a series of strong baselines on two cross-lingual MRC benchmark datasets.

[1]  James S. Duncan,et al.  SimCVD: Simple Contrastive Voxel-Wise Representation Distillation for Semi-Supervised Medical Image Segmentation , 2021, ArXiv.

[2]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[3]  Quoc V. Le,et al.  QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension , 2018, ICLR.

[4]  Yuexian Zou,et al.  Adaptive Bi-Directional Attention: Exploring Multi-Granularity Representations for Machine Reading Comprehension , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Richard Socher,et al.  XLDA: Cross-Lingual Data Augmentation for Natural Language Inference and Question Answering , 2019, ArXiv.

[6]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[7]  Lidong Bing,et al.  Improving Low-Resource Named Entity Recognition using Joint Sentence and Token Labeling , 2020, ACL.

[8]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[9]  Yuexian Zou,et al.  Contextualized Attention-based Knowledge Transfer for Spoken Conversational Question Answering , 2021, Interspeech 2021.

[10]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[11]  Ali Farhadi,et al.  Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.

[12]  Chenyu You,et al.  Self-supervised Contrastive Cross-Modality Representation Learning for Spoken Question Answering , 2021, EMNLP.

[13]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Wanxiang Che,et al.  Cross-Lingual Machine Reading Comprehension , 2019, EMNLP/IJCNLP.

[15]  Ming Zhou,et al.  Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks , 2019, EMNLP.

[16]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[17]  Ming Gong,et al.  Cross-lingual Machine Reading Comprehension with Language Branch Knowledge Distillation , 2020, COLING.

[18]  Matthijs Douze,et al.  Data Augmenting Contrastive Learning of Speech Representations in the Time Domain , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).

[19]  Yoshimasa Tsuruoka,et al.  Multilingual Extractive Reading Comprehension by Runtime Machine Translation , 2018, ArXiv.

[20]  Nan Duan,et al.  Enhancing Answer Boundary Detection for Multilingual Machine Reading Comprehension , 2020, ACL.

[21]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  Wanli Zuo,et al.  CalibreNet: Calibration Networks for Multilingual Sequence Labeling , 2020, ArXiv.

[23]  Danqi Chen,et al.  CoQA: A Conversational Question Answering Challenge , 2018, TACL.

[24]  Fan Yang,et al.  XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation , 2020, EMNLP.

[25]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[26]  Ming Zhou,et al.  InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training , 2021, NAACL.

[27]  Tianyu Gao,et al.  SimCSE: Simple Contrastive Learning of Sentence Embeddings , 2021, EMNLP.

[28]  Tao Shen,et al.  Improving Zero-Shot Cross-lingual Transfer for Multilingual Question Answering over Knowledge Graph , 2021, NAACL.

[29]  Sebastian Riedel,et al.  MLQA: Evaluating Cross-lingual Extractive Question Answering , 2019, ACL.

[30]  James S. Duncan,et al.  Momentum Contrastive Voxel-wise Representation Learning for Semi-supervised Volumetric Medical Image Segmentation , 2021, ArXiv.

[31]  Chenyu You,et al.  MRD-Net: Multi-Modal Residual Knowledge Distillation for Spoken Question Answering , 2021, IJCAI.

[32]  Madian Khabsa,et al.  CLEAR: Contrastive Learning for Sentence Representation , 2020, ArXiv.

[33]  Danqi Chen,et al.  A Discrete Hard EM Approach for Weakly Supervised Question Answering , 2019, EMNLP.