POS-Constrained Parallel Decoding for Non-autoregressive Generation

The multimodality problem has become a major challenge of existing non-autoregressive generation (NAG) systems. A common solution often resorts to sequence-level knowledge distillation by rebuilding the training dataset through autoregressive generation (hereinafter known as “teacher AG”). The success of such methods may largely depend on a latent assumption, i.e., the teacher AG is superior to the NAG model. However, in this work, we experimentally reveal that this assumption does not always hold for the text generation tasks like text summarization and story ending generation. To provide a feasible solution to the multimodality problem of NAG, we propose incorporating linguistic structure (Part-of-Speech sequence in particular) into NAG inference instead of relying on teacher AG. More specifically, the proposed POS-constrained Parallel Decoding (POSPD) method aims at providing a specific POS sequence to constrain the NAG model during decoding. Our experiments demonstrate that POSPD consistently improves NAG models on four text generation tasks to a greater extent compared to knowledge distillation. This observation validates the necessity of exploring the alternatives for sequence-level knowledge distillation.

[1]  Ming Zhou,et al.  Sequence-to-Dependency Neural Machine Translation , 2017, ACL.

[2]  Penelope Sibun,et al.  A Practical Part-of-Speech Tagger , 1992, ANLP.

[3]  Noah A. Smith,et al.  Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation , 2020, ArXiv.

[4]  Zhi-Hong Deng,et al.  Fast Structured Decoding for Sequence Models , 2019, NeurIPS.

[5]  Jiancheng Lv,et al.  TIGS: An Inference Algorithm for Text Infilling with Gradient Search , 2019, ACL.

[6]  Xiaocheng Feng,et al.  Enhanced Neural Machine Translation by Joint Decoding with Word and POS-tagging Sequences , 2020, Mob. Networks Appl..

[7]  Shamil Chollampatt,et al.  Lexically Constrained Neural Machine Translation with Levenshtein Transformer , 2020, ACL.

[8]  Graham Neubig,et al.  Understanding Knowledge Distillation in Non-autoregressive Machine Translation , 2020, ICLR.

[9]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[10]  Jiancheng Lv,et al.  GLGE: A New General Language Generation Evaluation Benchmark , 2021, FINDINGS.

[11]  Daniel Marcu,et al.  Scalable Inference and Training of Context-Rich Syntactic Translation Models , 2006, ACL.

[12]  Jungo Kasai,et al.  Non-autoregressive Machine Translation with Disentangled Context Transformer , 2020, ICML.

[13]  Jie Zhou,et al.  Learning to Recover from Multi-Modality Errors for Non-Autoregressive Neural Machine Translation , 2020, ACL.

[14]  Nathanael Chambers,et al.  A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories , 2016, NAACL.

[15]  Mirella Lapata,et al.  Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , 2018, EMNLP.

[16]  Enhong Chen,et al.  Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation , 2019, AAAI.

[17]  Enhong Chen,et al.  Jointly Masked Sequence-to-Sequence Model for Non-Autoregressive Neural Machine Translation , 2020, ACL.

[18]  Mohammad Norouzi,et al.  Non-Autoregressive Machine Translation with Latent Alignments , 2020, EMNLP.

[19]  Jason Weston,et al.  Neural Text Generation with Unlikelihood Training , 2019, ICLR.

[20]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[21]  Omer Levy,et al.  Mask-Predict: Parallel Decoding of Conditional Masked Language Models , 2019, EMNLP.

[22]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[23]  Yoshimasa Tsuruoka,et al.  Learning to Parse and Translate Improves Neural Machine Translation , 2017, ACL.

[24]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[25]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[26]  Omer Levy,et al.  Aligned Cross Entropy for Non-Autoregressive Machine Translation , 2020, ICML.

[27]  Eduard Hovy,et al.  FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow , 2019, EMNLP.

[28]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[29]  Dayiheng Liu,et al.  Revision in Continuous Space: Fine-Grained Control of Text Style Transfer , 2019, ArXiv.

[30]  Alon Lavie,et al.  Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.

[31]  Jiancheng Lv,et al.  Tell Me How to Ask Again: Question Data Augmentation with Controllable Rewriting in Continuous Space , 2020, EMNLP.

[32]  David Yarowsky,et al.  Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora , 2001, NAACL.

[33]  Tie-Yan Liu,et al.  Hint-Based Training for Non-Autoregressive Machine Translation , 2019, EMNLP.

[34]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[35]  Tie-Yan Liu,et al.  Task-Level Curriculum Learning for Non-Autoregressive Neural Machine Translation , 2020, IJCAI.