论文信息 - Cascaded Text Generation with Markov Transformers - 字舞流文

Cascaded Text Generation with Markov Transformers

The two dominant approaches to neural text generation are fully autoregressive models, using serial beam search decoding, and non-autoregressive models, using parallel decoding with no output dependencies. This work proposes an autoregressive model with sub-linear parallel time generation. Noting that conditional random fields with bounded context can be decoded in parallel, we propose an efficient cascaded decoding approach for generating high-quality output. To parameterize this cascade, we introduce a Markov transformer, a variant of the popular fully autoregressive model that allows us to simultaneously decode with specific autoregressive context cutoffs. This approach requires only a small modification from standard autoregressive training, while showing competitive accuracy/speed tradeoff compared to existing methods on five machine translation datasets.

Alexander M. Rush | Yuntian Deng

[1] Alec Radford,et al. Release Strategies and the Social Impacts of Language Models , 2019, ArXiv.

[2] Karol Gregor,et al. Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[3] Alexander M. Rush,et al. Latent Normalizing Flows for Discrete Sequences , 2019, ICML.

[4] Jakob Uszkoreit,et al. Insertion Transformer: Flexible Sequence Generation via Insertion Operations , 2019, ICML.

[5] Jakob Uszkoreit,et al. Blockwise Parallel Decoding for Deep Autoregressive Models , 2018, NeurIPS.

[6] Eugene Charniak,et al. Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[7] Di He,et al. Non-Autoregressive Machine Translation with Auxiliary Regularization , 2019, AAAI.

[8] Regina Barzilay,et al. Are We Safe Yet? The Limitations of Distributional Features for Fake News Detection , 2019, ArXiv.

[9] Mohammad Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.

[10] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[11] Alexander M. Rush,et al. Challenges in Data-to-Document Generation , 2017, EMNLP.

[12] Alexander M. Rush,et al. GLTR: Statistical Detection and Visualization of Generated Text , 2019, ACL.

[13] Alexander M. Rush,et al. Vine Pruning for Efficient Multi-Pass Dependency Parsing , 2012, NAACL.

[14] Jiawei Zhou,et al. Improving Non-autoregressive Neural Machine Translation with Monolingual Data , 2020, ACL.

[15] Di He,et al. Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input , 2018, AAAI.

[16] Hermann Ney,et al. LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[17] Omer Levy,et al. Mask-Predict: Parallel Decoding of Conditional Masked Language Models , 2019, EMNLP.

[18] Simo Särkkä,et al. Temporal Parallelization of Bayesian Smoothers , 2021, IEEE Transactions on Automatic Control.

[19] Yang Feng,et al. Speeding Up Neural Machine Translation Decoding by Cube Pruning , 2018, EMNLP.

[20] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[21] Jindrich Libovický,et al. End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification , 2018, EMNLP.

[22] Lior Wolf,et al. Using the Output Embedding to Improve Language Models , 2016, EACL.

[23] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[24] Ben Taskar,et al. Structured Prediction Cascades , 2010, AISTATS.

[25] Ethan Zuckerman,et al. Partisanship, Propaganda, and Disinformation: Online Media and the 2016 U.S. Presidential Election , 2017 .

[26] Graham Neubig,et al. Understanding Knowledge Distillation in Non-autoregressive Machine Translation , 2019, ICLR.

[27] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[28] Qi Liu,et al. Insertion-based Decoding with Automatically Inferred Generation Order , 2019, Transactions of the Association for Computational Linguistics.

[29] Geoffrey E. Hinton,et al. When Does Label Smoothing Help? , 2019, NeurIPS.

[30] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[31] Ondrej Bojar,et al. Results of the WMT14 Metrics Shared Task , 2013 .

[32] Omer Levy,et al. Constant-Time Machine Translation with Conditional Masked Language Models , 2019, IJCNLP 2019.

[33] Ephraim Nissan,et al. Digital technologies and artificial intelligence’s present and foreseeable impact on lawyering, judging, policing and law enforcement , 2015, AI & SOCIETY.

[34] Alex Wang,et al. A Generalized Framework of Sequence Generation with Application to Undirected Sequence Models , 2019, ArXiv.

[35] Ji Zhang,et al. Semi-Autoregressive Neural Machine Translation , 2018, EMNLP.

[36] Regina Barzilay,et al. The Limitations of Stylometry for Detecting Machine-Generated Fake News , 2019, Computational Linguistics.

[37] Ondrej Bojar,et al. Results of the WMT16 Metrics Shared Task , 2016 .

[38] Marcello Federico,et al. Report on the 11th IWSLT evaluation campaign , 2014, IWSLT.

[39] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[40] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[41] Hao Zhou,et al. Imitation Learning for Non-Autoregressive Neural Machine Translation , 2019, ACL.

[42] Alexander M. Rush. Torch-Struct: Deep Structured Prediction Library , 2020, ACL.

[43] Jason Weston,et al. Neural Text Generation with Unlikelihood Training , 2019, ICLR.

[44] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[45] Eduard Hovy,et al. FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow , 2019, EMNLP.

[46] David Ellis,et al. Multilevel Coarse-to-Fine PCFG Parsing , 2006, NAACL.

[47] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[48] Simo Särkkä,et al. Temporal Parallelization of Bayesian Filters and Smoothers , 2019, ArXiv.

[49] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[50] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[51] Richard Socher,et al. Evaluating the Factual Consistency of Abstractive Text Summarization , 2019, EMNLP.

[52] Marc'Aurelio Ranzato,et al. Real or Fake? Learning to Discriminate Machine from Human Generated Text , 2019, ArXiv.

[53] Victor O. K. Li,et al. Non-Autoregressive Neural Machine Translation , 2017, ICLR.

[54] Jason Weston,et al. A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[55] Mohammad Norouzi,et al. Non-Autoregressive Machine Translation with Latent Alignments , 2020, EMNLP.

[56] Zhi-Hong Deng,et al. Fast Structured Decoding for Sequence Models , 2019, NeurIPS.

[57] Omer Levy,et al. Semi-Autoregressive Training Improves Mask-Predict Decoding , 2020, ArXiv.

[58] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[59] Alexander M. Rush,et al. Sequence-Level Knowledge Distillation , 2016, EMNLP.

[60] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[61] Zhe Gan,et al. Pointer: Constrained Text Generation via Insertion-based Generative Pre-training , 2020, EMNLP.

[62] Francesco Marcelloni,et al. A survey on fake news and rumour detection techniques , 2019, Inf. Sci..

[63] Ali Farhadi,et al. Defending Against Neural Fake News , 2019, NeurIPS.

[64] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[65] Changhan Wang,et al. Levenshtein Transformer , 2019, NeurIPS.

[66] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[67] Christopher D. Manning,et al. Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[68] Jason Lee,et al. Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement , 2018, EMNLP.