Underspecified Universal Dependency Structures as Inputs for Multilingual Surface Realisation

In this paper, we present the datasets used in the Shallow and Deep Tracks of the First Multilingual Surface Realisation Shared Task (SR’18). For the Shallow Track, data in ten languages has been released: Arabic, Czech, Dutch, English, Finnish, French, Italian, Portuguese, Russian and Spanish. For the Deep Track, data in three languages is made available: English, French and Spanish. We describe in detail how the datasets were derived from the Universal Dependencies V2.0, and report on an evaluation of the Deep Track input quality. In addition, we examine the motivation for, and likely usefulness of, deriving NLG inputs from annotations in resources originally developed for Natural Language Understanding (NLU), and assess whether the resulting inputs supply enough information of the right kind for the final stage in the NLG process.

[1]  Joakim Nivre,et al.  Universal Stanford dependencies: A cross-linguistic typology , 2014, LREC.

[2]  Richard Johansson,et al.  The CoNLL 2008 Shared Task on Joint Parsing of Syntactic and Semantic Dependencies , 2008, CoNLL.

[3]  Jun'ichi Tsujii,et al.  Probabilistic Models for Disambiguation of an HPSG-Based Chart Generator , 2005, IWPT.

[4]  Ralph Grishman,et al.  The NomBank Project: An Interim Report , 2004, FCP@NAACL-HLT.

[5]  John A. Bateman,et al.  Enabling technology for multilingual natural language generation: the KPML development environment , 1997, Natural Language Engineering.

[6]  Dan Klein,et al.  Learning Semantic Correspondences with Less Supervision , 2009, ACL.

[7]  Verena Rieser,et al.  The E2E Dataset: New Challenges For End-to-End Generation , 2017, SIGDIAL Conference.

[8]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[9]  Irene Langkilde-Geary,et al.  An Empirical Verification of Coverage and Correctness for a General-Purpose Sentence Generator , 2002, INLG.

[10]  Anja Belz,et al.  The First Surface Realisation Shared Task: Overview and Evaluation Results , 2011, ENLG.

[11]  Albert Gatt,et al.  The TUNA-REG Challenge 2009: Overview and Evaluation Results , 2009, ENLG.

[12]  Anja Belz,et al.  The GREC Challenges 2010: Overview and Evaluation Results , 2010, INLG.

[13]  Albert Gatt,et al.  SimpleNLG: A Realisation Engine for Practical Applications , 2009, ENLG.

[14]  Josef van Genabith,et al.  Robust PCFG-Based Generation Using Automatically Acquired LFG Approximations , 2006, ACL.

[15]  Leo Wanner,et al.  The First Multilingual Surface Realisation Shared Task (SR’18): Overview and Evaluation Results , 2018 .

[16]  Jonathan May,et al.  SemEval-2017 Task 9: Abstract Meaning Representation Parsing and Generation , 2017, *SEMEVAL.

[17]  Leo Wanner,et al.  Open Soucre Graph Transducer Interpreter and Grammar Development Environment , 2010, LREC.

[18]  MIGUEL BALLESTEROS,et al.  Data-driven deep-syntactic dependency parsing† , 2015, Natural Language Engineering.

[19]  Leo Wanner,et al.  UPF at EPE 2017: transduction-based deep analysis , 2017 .

[20]  Michael White,et al.  Perceptron Reranking for CCG Realization , 2009, EMNLP.

[21]  Michael Elhadad,et al.  An Overview of SURGE: a Reusable Comprehensive Syntactic Realization Component , 1996, INLG.