Controlled Text Generation as Continuous Optimization with Multiple Constraints

As large-scale language model pretraining pushes the state-of-the-art in text generation, recent work has turned to controlling attributes of the text such models generate. While modifying the pretrained models via fine-tuning remains the popular approach, it incurs a significant computational cost and can be infeasible due to lack of appropriate data. As an alternative, we propose MUCOCO—a flexible and modular algorithm for controllable inference from pretrained models. We formulate the decoding process as an optimization problem which allows for multiple attributes we aim to control to be easily incorporated as differentiable constraints to the optimization. By relaxing this discrete optimization to a continuous one, we make use of Lagrangian multipliers and gradient-descent based techniques to generate the desired text. We evaluate our approach on controllable machine translation and style transfer with multiple sentence-level attributes and observe significant improvements over baselines.

[1]  Lav R. Varshney,et al.  CTRL: A Conditional Transformer Language Model for Controllable Generation , 2019, ArXiv.

[2]  Akhilesh Sudhakar,et al.  “Transforming” Delete, Retrieve, Generate Approach for Controlled Text Style Transfer , 2019, EMNLP.

[3]  Regina Barzilay,et al.  Style Transfer from Non-Parallel Text by Cross-Alignment , 2017, NIPS.

[4]  Samuel R. Bowman,et al.  Neural Network Acceptability Judgments , 2018, Transactions of the Association for Computational Linguistics.

[5]  Sameer Singh,et al.  Universal Adversarial Triggers for Attacking and Analyzing NLP , 2019, EMNLP.

[6]  Ali Farhadi,et al.  Defending Against Neural Fake News , 2019, NeurIPS.

[7]  Kevin Knight,et al.  Obfuscating Gender in Social Media Writing , 2016, NLP+CSS@EMNLP.

[8]  Yulia Tsvetkov,et al.  Unsupervised Discovery of Implicit Gender Bias , 2020, EMNLP.

[9]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[10]  Artidoro Pagnoni,et al.  Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics , 2021, NAACL.

[11]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[12]  Yoav Goldberg,et al.  Controlling Linguistic Style Aspects in Neural Language Generation , 2017, ArXiv.

[13]  Samy Bengio,et al.  Content preserving text generation with attribute controls , 2018, NeurIPS.

[14]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[15]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[16]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[17]  Eneko Agirre,et al.  SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.

[18]  Guillaume Lample,et al.  Multiple-Attribute Text Style Transfer , 2018, ArXiv.

[19]  G. Debreu VALUATION EQUILIBRIUM AND PARETO OPTIMUM. , 1954, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[21]  Mohit Iyyer,et al.  Reformulating Unsupervised Style Transfer as Paraphrase Generation , 2020, EMNLP.

[22]  Jason Yosinski,et al.  Plug and Play Language Models: A Simple Approach to Controlled Text Generation , 2020, ICLR.

[23]  Yulia Tsvetkov,et al.  Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs , 2018, ICLR.

[24]  Richard Socher,et al.  GeDi: Generative Discriminator Guided Sequence Generation , 2021, EMNLP.

[25]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[26]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[27]  Michael J. Paul,et al.  Neural User Factor Adaptation for Text Classification: Learning to Generalize Across Author Demographics , 2019, *SEM@NAACL-HLT.

[28]  Gholamreza Haffari,et al.  Towards Decoding as Continuous Optimisation in Neural Machine Translation , 2017, EMNLP.

[29]  Yejin Choi,et al.  PowerTransformer: Unsupervised Controllable Revision for Biased Language Correction , 2020, EMNLP.

[30]  Qingfu Zhang,et al.  Pareto Multi-Task Learning , 2019, NeurIPS.

[31]  Alec Radford,et al.  Fine-Tuning Language Models from Human Preferences , 2019, ArXiv.

[32]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[33]  Yejin Choi,et al.  On-the-Fly Controlled Text Generation with Experts and Anti-Experts , 2021, ArXiv.

[34]  Bill Byrne,et al.  On NMT Search Errors and Model Errors: Cat Got Your Tongue? , 2019, EMNLP.

[35]  Marc'Aurelio Ranzato,et al.  Classical Structured Prediction Losses for Sequence to Sequence Learning , 2017, NAACL.

[36]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[37]  Dan Klein,et al.  FUDGE: Controlled Text Generation With Future Discriminators , 2021, NAACL.

[38]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[39]  Zhao Chen,et al.  GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks , 2017, ICML.

[40]  Vladlen Koltun,et al.  Multi-Task Learning as Multi-Objective Optimization , 2018, NeurIPS.

[41]  Yulia Tsvetkov,et al.  Style Transfer Through Back-Translation , 2018, ACL.

[42]  Yejin Choi,et al.  RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models , 2020, FINDINGS.

[43]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[44]  Iryna Gurevych,et al.  Making Monolingual Sentence Embeddings Multilingual Using Knowledge Distillation , 2020, EMNLP.

[45]  Emily M. Bender,et al.  On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 , 2021, FAccT.

[46]  Percy Liang,et al.  Prefix-Tuning: Optimizing Continuous Prompts for Generation , 2021, ACL.

[47]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[48]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[49]  Percy Liang,et al.  Delete, Retrieve, Generate: a Simple Approach to Sentiment and Style Transfer , 2018, NAACL.

[50]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[51]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[52]  Li Fei-Fei,et al.  Dynamic Task Prioritization for Multitask Learning , 2018, ECCV.

[53]  Yejin Choi,et al.  Back to the Future: Unsupervised Backprop-based Decoding for Counterfactual and Abductive Commonsense Reasoning , 2020, EMNLP.

[54]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55]  Graham Neubig,et al.  Beyond BLEU:Training Neural Machine Translation with Semantic Similarity , 2019, ACL.

[56]  Alexander Rush,et al.  Adversarial Semantic Collisions , 2020, EMNLP.

[57]  Jörg Tiedemann,et al.  OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles , 2016, LREC.

[58]  André F. T. Martins,et al.  Marian: Fast Neural Machine Translation in C++ , 2018, ACL.

[59]  D. Song,et al.  Imitation Attacks and Defenses for Black-box Machine Translation Systems , 2020, EMNLP.

[60]  Victor O. K. Li,et al.  Non-Autoregressive Neural Machine Translation , 2017, ICLR.

[61]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[62]  Barnabás Póczos,et al.  Minimizing FLOPs to Learn Efficient Sparse Representations , 2020, ICLR.

[63]  Qingfu Zhang,et al.  Controllable Pareto Multi-Task Learning , 2020, ArXiv.

[64]  Yu Cheng,et al.  Contextual Text Style Transfer , 2020, FINDINGS.

[65]  John C. Platt,et al.  Constrained Differential Optimization , 1987, NIPS.

[66]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[67]  Qun Liu,et al.  Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search , 2017, ACL.

[68]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[69]  Shourya Roy,et al.  Earth Mover's Distance Pooling over Siamese LSTMs for Automatic Short Answer Grading , 2017, IJCAI.

[70]  Jason Weston,et al.  Neural Text Generation with Unlikelihood Training , 2019, ICLR.

[71]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[72]  Joel R. Tetreault,et al.  Dear Sir or Madam, May I Introduce the GYAFC Dataset: Corpus, Benchmarks and Metrics for Formality Style Transfer , 2018, NAACL.

[73]  Eric P. Xing,et al.  Toward Controlled Generation of Text , 2017, ICML.