Cascaded Text Generation with Markov Transformers

The two dominant approaches to neural text generation are fully autoregressive models, using serial beam search decoding, and non-autoregressive models, using parallel decoding with no output dependencies. This work proposes an autoregressive model with sub-linear parallel time generation. Noting that conditional random fields with bounded context can be decoded in parallel, we propose an efficient cascaded decoding approach for generating high-quality output. To parameterize this cascade, we introduce a Markov transformer, a variant of the popular fully autoregressive model that allows us to simultaneously decode with specific autoregressive context cutoffs. This approach requires only a small modification from standard autoregressive training, while showing competitive accuracy/speed tradeoff compared to existing methods on five machine translation datasets.

[1]  Alec Radford,et al.  Release Strategies and the Social Impacts of Language Models , 2019, ArXiv.

[2]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[3]  Alexander M. Rush,et al.  Latent Normalizing Flows for Discrete Sequences , 2019, ICML.

[4]  Jakob Uszkoreit,et al.  Insertion Transformer: Flexible Sequence Generation via Insertion Operations , 2019, ICML.

[5]  Jakob Uszkoreit,et al.  Blockwise Parallel Decoding for Deep Autoregressive Models , 2018, NeurIPS.

[6]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[7]  Di He,et al.  Non-Autoregressive Machine Translation with Auxiliary Regularization , 2019, AAAI.

[8]  Regina Barzilay,et al.  Are We Safe Yet? The Limitations of Distributional Features for Fake News Detection , 2019, ArXiv.

[9]  Mohammad Shoeybi,et al.  Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.

[10]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[11]  Alexander M. Rush,et al.  Challenges in Data-to-Document Generation , 2017, EMNLP.

[12]  Alexander M. Rush,et al.  GLTR: Statistical Detection and Visualization of Generated Text , 2019, ACL.

[13]  Alexander M. Rush,et al.  Vine Pruning for Efficient Multi-Pass Dependency Parsing , 2012, NAACL.

[14]  Jiawei Zhou,et al.  Improving Non-autoregressive Neural Machine Translation with Monolingual Data , 2020, ACL.

[15]  Di He,et al.  Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input , 2018, AAAI.

[16]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[17]  Omer Levy,et al.  Mask-Predict: Parallel Decoding of Conditional Masked Language Models , 2019, EMNLP.

[18]  Simo Särkkä,et al.  Temporal Parallelization of Bayesian Smoothers , 2021, IEEE Transactions on Automatic Control.

[19]  Yang Feng,et al.  Speeding Up Neural Machine Translation Decoding by Cube Pruning , 2018, EMNLP.

[20]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[21]  Jindrich Libovický,et al.  End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification , 2018, EMNLP.

[22]  Lior Wolf,et al.  Using the Output Embedding to Improve Language Models , 2016, EACL.

[23]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[24]  Ben Taskar,et al.  Structured Prediction Cascades , 2010, AISTATS.

[25]  Ethan Zuckerman,et al.  Partisanship, Propaganda, and Disinformation: Online Media and the 2016 U.S. Presidential Election , 2017 .

[26]  Graham Neubig,et al.  Understanding Knowledge Distillation in Non-autoregressive Machine Translation , 2019, ICLR.

[27]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[28]  Qi Liu,et al.  Insertion-based Decoding with Automatically Inferred Generation Order , 2019, Transactions of the Association for Computational Linguistics.

[29]  Geoffrey E. Hinton,et al.  When Does Label Smoothing Help? , 2019, NeurIPS.

[30]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[31]  Ondrej Bojar,et al.  Results of the WMT14 Metrics Shared Task , 2013 .

[32]  Omer Levy,et al.  Constant-Time Machine Translation with Conditional Masked Language Models , 2019, IJCNLP 2019.

[33]  Ephraim Nissan,et al.  Digital technologies and artificial intelligence’s present and foreseeable impact on lawyering, judging, policing and law enforcement , 2015, AI & SOCIETY.

[34]  Alex Wang,et al.  A Generalized Framework of Sequence Generation with Application to Undirected Sequence Models , 2019, ArXiv.

[35]  Ji Zhang,et al.  Semi-Autoregressive Neural Machine Translation , 2018, EMNLP.

[36]  Regina Barzilay,et al.  The Limitations of Stylometry for Detecting Machine-Generated Fake News , 2019, Computational Linguistics.

[37]  Ondrej Bojar,et al.  Results of the WMT16 Metrics Shared Task , 2016 .

[38]  Marcello Federico,et al.  Report on the 11th IWSLT evaluation campaign , 2014, IWSLT.

[39]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[40]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[41]  Hao Zhou,et al.  Imitation Learning for Non-Autoregressive Neural Machine Translation , 2019, ACL.

[42]  Alexander M. Rush Torch-Struct: Deep Structured Prediction Library , 2020, ACL.

[43]  Jason Weston,et al.  Neural Text Generation with Unlikelihood Training , 2019, ICLR.

[44]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[45]  Eduard Hovy,et al.  FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow , 2019, EMNLP.

[46]  David Ellis,et al.  Multilevel Coarse-to-Fine PCFG Parsing , 2006, NAACL.

[47]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[48]  Simo Särkkä,et al.  Temporal Parallelization of Bayesian Filters and Smoothers , 2019, ArXiv.

[49]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[50]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[51]  Richard Socher,et al.  Evaluating the Factual Consistency of Abstractive Text Summarization , 2019, EMNLP.

[52]  Marc'Aurelio Ranzato,et al.  Real or Fake? Learning to Discriminate Machine from Human Generated Text , 2019, ArXiv.

[53]  Victor O. K. Li,et al.  Non-Autoregressive Neural Machine Translation , 2017, ICLR.

[54]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[55]  Mohammad Norouzi,et al.  Non-Autoregressive Machine Translation with Latent Alignments , 2020, EMNLP.

[56]  Zhi-Hong Deng,et al.  Fast Structured Decoding for Sequence Models , 2019, NeurIPS.

[57]  Omer Levy,et al.  Semi-Autoregressive Training Improves Mask-Predict Decoding , 2020, ArXiv.

[58]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[59]  Alexander M. Rush,et al.  Sequence-Level Knowledge Distillation , 2016, EMNLP.

[60]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[61]  Zhe Gan,et al.  Pointer: Constrained Text Generation via Insertion-based Generative Pre-training , 2020, EMNLP.

[62]  Francesco Marcelloni,et al.  A survey on fake news and rumour detection techniques , 2019, Inf. Sci..

[63]  Ali Farhadi,et al.  Defending Against Neural Fake News , 2019, NeurIPS.

[64]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[65]  Changhan Wang,et al.  Levenshtein Transformer , 2019, NeurIPS.

[66]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[67]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[68]  Jason Lee,et al.  Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement , 2018, EMNLP.