论文信息 - Improving Neural Cross-Lingual Abstractive Summarization via Employing Optimal Transport Distance for Knowledge Distillation

Improving Neural Cross-Lingual Abstractive Summarization via Employing Optimal Transport Distance for Knowledge Distillation

Current state-of-the-art cross-lingual summarization models employ multi-task learning paradigm, which works on a shared vocabulary module and relies on the self-attention mechanism to attend among tokens in two languages. However, correlation learned by self-attention is often loose and implicit, inefficient in capturing crucial cross-lingual representations between languages. The matter worsens when performing on languages with separate morphological or structural features, making the cross-lingual alignment more challenging, resulting in the performance drop. To overcome this problem, we propose a novel Knowledge-Distillation-based framework for Cross-Lingual Summarization, seeking to explicitly construct cross-lingual correlation by distilling the knowledge of the monolingual summarization teacher into the cross-lingual summarization student. Since the representations of the teacher and the student lie on two different vector spaces, we further propose a Knowledge Distillation loss using Sinkhorn Divergence, an Optimal-Transport distance, to estimate the discrepancy between those teacher and student representations. Due to the intuitively geometric nature of Sinkhorn Divergence, the student model can productively learn to align its produced cross-lingual hidden states with monolingual hidden states, hence leading to a strong correlation between distant languages. Experiments on cross-lingual summarization datasets in pairs of distant languages demonstrate that our method outperforms state-of-the-art models under both high and low-resourced settings.

A. Luu | Thong Nguyen

[1] Anh Tuan Luu,et al. Contrastive Learning for Neural Topic Model , 2021, NeurIPS.

[2] Tho Quan,et al. Enriching and Controlling Global Semantics for Text Summarization , 2021, EMNLP.

[3] Yvette Graham,et al. Improving Unsupervised Question Answering via Summarization-Informed Question Generation , 2021, EMNLP.

[4] Chung-Wei Hang,et al. Improved Text Classification via Contrastive Adversarial Training , 2021, AAAI.

[5] Heyan Huang,et al. Cross-Lingual Abstractive Summarization with Limited Parallel Resources , 2021, ACL.

[6] Mingxuan Wang,et al. Contrastive Learning for Many-to-many Multilingual Neural Machine Translation , 2021, ACL.

[7] S. Friedland,et al. Quantum Optimal Transport , 2021, 2105.06922.

[8] Nanyun Peng,et al. Improving Zero-Shot Cross-Lingual Transfer Learning via Robust Training , 2021, EMNLP.

[9] Hua Wu,et al. Data Augmentation with Hierarchical SQL-to-Question Generation for Cross-domain Text-to-SQL Parsing , 2021, EMNLP.

[10] Bin Bi,et al. VECO: Variable and Flexible Cross-lingual Pre-training for Language Understanding and Generation , 2020, ACL.

[11] Claire Cardie,et al. WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization , 2020, FINDINGS.