Mention Flags (MF): Constraining Transformer-based Text Generators

This paper focuses on Seq2Seq (S2S) constrained text generation where the text generator is constrained to mention specific words, which are inputs to the encoder, in the generated outputs. Pre-trained S2S models such as T5 or a Copy Mechanism can be trained to copy the surface tokens from encoders to decoders, but they cannot guarantee constraint satisfaction. Constrained decoding algorithms always produce hypotheses satisfying all constraints. However, they are computationally expensive and can lower the generated text quality. In this paper, we propose Mention Flags (MF), which trace whether lexical constraints are satisfied in the generated outputs of an S2S decoder. The MF models are trained to generate tokens until all constraints are satisfied, guaranteeing high constraint satisfaction. Our experiments on the Common Sense Generation task (CommonGen) (Lin et al., 2020), End2end Data-to-Text task (E2ENLG) (Dušek et al., 2020) and Novel Object Captioning task (nocaps) (Agrawal et al., 2019) show that the MF models maintain higher constraint satisfaction and text quality than the baseline models and other constrained text generation algorithms, achieving state-of-the-art performance on all three tasks. These results are achieved with a much lower run-time than constrained decoding algorithms. We also show that the MF models work well in the low-resource setting. 1

[1]  Jianfeng Gao,et al.  Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.

[2]  Marilyn A. Walker,et al.  A Deep Ensemble Model with Slot Alignment for Sequence-to-Sequence Natural Language Generation , 2018, NAACL.

[3]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[4]  Yejin Choi,et al.  CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning , 2020, EMNLP.

[5]  Changhan Wang,et al.  Levenshtein Transformer , 2019, NeurIPS.

[6]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[7]  Philip S. Yu,et al.  KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense Reasoning , 2020, AAAI.

[8]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[9]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[10]  Mihir Kale,et al.  Text-to-Text Pre-Training for Data-to-Text Tasks , 2020, INLG.

[11]  Basura Fernando,et al.  Guided Open Vocabulary Image Captioning with Constrained Beam Search , 2016, EMNLP.

[12]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[13]  Ondrej Dusek,et al.  Sequence-to-Sequence Generation for Spoken Dialogue via Deep Syntax Trees and Strings , 2016, ACL.

[14]  Claire Gardent,et al.  The WebNLG Challenge: Generating Text from RDF Data , 2017, INLG.

[15]  Basura Fernando,et al.  SPICE: Semantic Propositional Image Caption Evaluation , 2016, ECCV.

[16]  Xinlei Chen,et al.  nocaps: novel object captioning at scale , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Stephen Wan,et al.  ECOL-R: Encouraging Copying in Novel Object Captioning with Reinforcement Learning , 2021, EACL.

[18]  C. Lawrence Zitnick,et al.  CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[20]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[21]  Kartikeya Upasani,et al.  Constrained Decoding for Neural NLG from Compositional Representations in Task-Oriented Dialogue , 2019, ACL.

[22]  Jianfeng Gao,et al.  VIVO: Surpassing Human Performance in Novel Object Captioning with Visual Vocabulary Pre-Training , 2020, ArXiv.

[23]  Qun Liu,et al.  Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search , 2017, ACL.

[24]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[25]  David Vandyke,et al.  Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems , 2015, EMNLP.

[26]  Shamil Chollampatt,et al.  Lexically Constrained Neural Machine Translation with Levenshtein Transformer , 2020, ACL.

[27]  Dan Klein,et al.  Pragmatically Informative Text Generation , 2019, NAACL.

[28]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[29]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[30]  Verena Rieser,et al.  Evaluating the State-of-the-Art of End-to-End Natural Language Generation: The E2E NLG Challenge , 2019, Comput. Speech Lang..

[31]  Verena Rieser,et al.  Semantic Noise Matters for Neural Natural Language Generation , 2019, INLG.

[32]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[33]  Mirella Lapata,et al.  Data-to-Text Generation with Content Selection and Planning , 2018, AAAI.

[34]  Lei Zhang,et al.  Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.