Multilingual AMR-to-Text Generation

Generating text from structured data is challenging because it requires bridging the gap between (i) structure and natural language (NL) and (ii) semantically underspecified input and fully specified NL output. Multilingual generation brings in an additional challenge: that of generating into languages with varied word order and morphological properties. In this work, we focus on Abstract Meaning Representations (AMRs) as structured input, where previous research has overwhelmingly focused on generating only into English. We leverage advances in cross-lingual embeddings, pretraining, and multilingual models to create multilingual AMR-to-text models that generate in twenty one different languages. For eighteen languages, based on automatic metrics, our multilingual models surpass baselines that generate into a single language. We analyse the ability of our multilingual models to accurately capture morphology and word order using human evaluation, and find that native speakers judge our generations to be fluent.

[1]  András Kornai,et al.  BME-UW at SRST-2019: Surface realization with Interpreted Regular Tree Grammars , 2019, MSR@EMNLP-IJCNLP.

[2]  Nanyun Peng,et al.  On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing , 2018, NAACL.

[3]  Sebastian Riedel,et al.  MLQA: Evaluating Cross-lingual Extractive Question Answering , 2019, ACL.

[4]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[5]  Jaime G. Carbonell,et al.  Generation from Abstract Meaning Representation using Tree Transducers , 2016, NAACL.

[6]  Pascal Poupart,et al.  Order-Planning Neural Text Generation From Structured Data , 2017, AAAI.

[7]  Claire Gardent,et al.  Surface Realisation Using Full Delexicalisation , 2019, EMNLP.

[8]  Anja Belz,et al.  The Second Multilingual Surface Realisation Shared Task (SR'19): Overview and Evaluation Results , 2019, MSR@EMNLP-IJCNLP.

[9]  Andreas Vlachos,et al.  Sheffield at SemEval-2017 Task 9: Transition-based language generation from AMR , 2017, SemEval@ACL.

[10]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[11]  Kevin Knight,et al.  Generating English from Abstract Meaning Representations , 2016, INLG.

[12]  Claire Gardent,et al.  LORIA / Lorraine University at Multilingual Surface Realisation 2019 , 2019, EMNLP.

[13]  Iryna Gurevych,et al.  Enhancing AMR-to-Text Generation with Dual Graph Representations , 2019, EMNLP.

[14]  Graham Neubig,et al.  XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.

[15]  Diego Marcheggiani,et al.  Deep Graph Convolutional Encoders for Structured Data to Text Generation , 2018, INLG.

[16]  Shashi Narayan,et al.  Leveraging Pre-trained Checkpoints for Sequence Generation Tasks , 2019, Transactions of the Association for Computational Linguistics.

[17]  Shay B. Cohen,et al.  Structural Neural Encoders for AMR-to-text Generation , 2019, NAACL.

[18]  Mark Dredze,et al.  Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT , 2019, EMNLP.

[19]  Shay B. Cohen,et al.  Cross-Lingual Abstract Meaning Representation Parsing , 2017, NAACL.

[20]  Yejin Choi,et al.  Neural AMR: Sequence-to-Sequence Models for Parsing and Generation , 2017, ACL.

[21]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[22]  Holger Schwenk,et al.  CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB , 2019, ArXiv.

[23]  Claire Gardent,et al.  The WebNLG Challenge: Generating Text from RDF Data , 2017, INLG.

[24]  Ngoc Thang Vu,et al.  IMSurReal: IMS at the Surface Realization Shared Task 2019 , 2019, MSR@EMNLP-IJCNLP.

[25]  Xu Tan,et al.  MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.

[26]  David Grangier,et al.  Neural Text Generation from Structured Data with Application to the Biography Domain , 2016, EMNLP.

[27]  Yue Zhang,et al.  A Graph-to-Sequence Model for AMR-to-Text Generation , 2018, ACL.

[28]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[29]  Thiago Alexandre Salgueiro Pardo,et al.  Back-Translation as Strategy to Tackle the Lack of Corpus in Natural Language Generation from Semantic Representations , 2019, EMNLP.

[30]  Mathias Niepert,et al.  Attending to Future Tokens for Bidirectional Sequence Generation , 2019, EMNLP.

[31]  Roberto Carlini,et al.  FORGe at SemEval-2017 Task 9: Deep sentence generation based on a sequence of graph transducers , 2017, SemEval@ACL.

[32]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[33]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[34]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[35]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[36]  Edouard Grave,et al.  Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.

[37]  Guodong Zhou,et al.  Modeling Graph Structure in Transformer for Better AMR-to-Text Generation , 2019, EMNLP.

[38]  Snigdha Chaturvedi,et al.  Bridging the Structural Gap Between Encoding and Decoding for Data-To-Text Generation , 2020, ACL.

[39]  Yann Dauphin,et al.  Strategies for Structuring Story Generation , 2019, ACL.

[40]  Ankur Bapna,et al.  Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges , 2019, ArXiv.

[41]  Xiaodong Liu,et al.  Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.

[42]  Claire Gardent,et al.  Using Local Knowledge Graph Construction to Scale Seq2Seq Models to Multi-Document Inputs , 2019, EMNLP.

[43]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[44]  Yoshua Bengio,et al.  Multi-way, multilingual neural machine translation , 2017, Comput. Speech Lang..

[45]  Thiago Castro Ferreira,et al.  Surface Realization Shared Task 2019 (MSR19): The Team 6 Approach , 2019, MSR@EMNLP-IJCNLP.

[46]  Philipp Koehn,et al.  Abstract Meaning Representation for Sembanking , 2013, LAW@ACL.

[47]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[48]  Vishrav Chaudhary,et al.  CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data , 2019, LREC.

[49]  Leo Wanner,et al.  The First Multilingual Surface Realisation Shared Task (SR’18): Overview and Evaluation Results , 2018 .

[50]  Orhan Firat,et al.  Massively Multilingual Neural Machine Translation , 2019, NAACL.

[51]  Eva Schlinger,et al.  How Multilingual is Multilingual BERT? , 2019, ACL.