CTRL: A Conditional Transformer Language Model for Controllable Generation

Large-scale language models show promising text generation capabilities, but users cannot easily control particular aspects of the generated text. We release CTRL, a 1.63 billion-parameter conditional transformer language model, trained to condition on control codes that govern style, content, and task-specific behavior. Control codes were derived from structure that naturally co-occurs with raw text, preserving the advantages of unsupervised learning while providing more explicit control over text generation. These codes also allow CTRL to predict which parts of the training data are most likely given a sequence. This provides a potential method for analyzing large amounts of data via model-based source attribution. We have released multiple full-sized, pretrained versions of CTRL at this https URL.

[1]  Carol Pfaff Constraints on Language Mixing: Intrasentential Code-Switching and Borrowing in Spanish/English , 1979 .

[2]  Shana Poplack,et al.  Sometimes I'll Start a Sentence in Spanish Y Termino En Espanol: toward a Typology of Code-switching 1 , 2010 .

[3]  Shana Poplack,et al.  Sometimes I’ll start a sentence in Spanish Y TERMINO EN ESPAÑOL: toward a typology of code-switching1 , 1980 .

[4]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[5]  Thorsten Brants,et al.  Large Language Models in Machine Translation , 2007, EMNLP.

[6]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[7]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[8]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[9]  Hector J. Levesque,et al.  The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.

[10]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[11]  David Kaiser,et al.  Dual-use research: Self-censorship is not enough , 2012, Nature.

[12]  J. Stilgoe,et al.  Developing a framework for responsible innovation* , 2013, The Ethics of Nanotechnology, Geoengineering and Clean Energy.

[13]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[14]  Miles Brundage,et al.  Artificial Intelligence and Responsible Innovation , 2013, PT-AI.

[15]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[16]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[17]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[18]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[19]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[20]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  悠太 菊池,et al.  大規模要約資源としてのNew York Times Annotated Corpus , 2015 .

[23]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[24]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[25]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[26]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[29]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[30]  Quoc V. Le,et al.  Multi-task Sequence to Sequence Learning , 2015, ICLR.

[31]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[32]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[33]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[34]  Kyunghyun Cho,et al.  SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine , 2017, ArXiv.

[35]  Philip Bachman,et al.  NewsQA: A Machine Comprehension Dataset , 2016, Rep4NLP@ACL.

[36]  Hakan Inan,et al.  Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.

[37]  Lior Wolf,et al.  Using the Output Embedding to Improve Language Models , 2016, EACL.

[38]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[39]  Lukasz Kaiser,et al.  One Model To Learn Them All , 2017, ArXiv.

[40]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[41]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[42]  Yoshimasa Tsuruoka,et al.  A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks , 2016, EMNLP.

[43]  Josep Maria Crego,et al.  Domain Control for Neural Machine Translation , 2016, RANLP.

[44]  Richard Socher,et al.  Learned in Translation: Contextualized Word Vectors , 2017, NIPS.

[45]  Eunsol Choi,et al.  TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.

[46]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[47]  Yoshua Bengio,et al.  HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[48]  Yann Dauphin,et al.  Hierarchical Neural Story Generation , 2018, ACL.

[49]  Aurko Roy,et al.  Fast Decoding in Sequence Models using Discrete Latent Variables , 2018, ICML.

[50]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[51]  Mor Naaman,et al.  Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies , 2018, NAACL.

[52]  Noam Shazeer,et al.  Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.

[53]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[54]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[55]  Richard Socher,et al.  The Natural Language Decathlon: Multitask Learning as Question Answering , 2018, ArXiv.

[56]  Hyrum S. Anderson,et al.  The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation , 2018, ArXiv.

[57]  Sebastian Ruder,et al.  Fine-tuned Language Models for Text Classification , 2018, ArXiv.

[58]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[59]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[60]  Quoc V. Le,et al.  A Simple Method for Commonsense Reasoning , 2018, ArXiv.

[61]  Yoram Singer,et al.  Memory-Efficient Adaptive Optimization for Large-Scale Learning , 2019, ArXiv.

[62]  Ming-Wei Chang,et al.  Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.

[63]  Ilya Sutskever,et al.  Generating Long Sequences with Sparse Transformers , 2019, ArXiv.

[64]  Lav R. Varshney,et al.  Pretrained AI Models: Performativity, Mobility, and Change , 2019, ArXiv.

[65]  Kush R. Varshney,et al.  Increasing Trust in AI Services through Supplier's Declarations of Conformity , 2018, IBM J. Res. Dev..

[66]  Marianna Apidianaki,et al.  SUM-QE: a BERT-based Summary Quality Estimation Model , 2019, EMNLP.

[67]  Ciprian Chelba,et al.  Tagged Back-Translation , 2019, WMT.

[68]  Inioluwa Deborah Raji,et al.  Model Cards for Model Reporting , 2018, FAT.

[69]  Richard Socher,et al.  Explain Yourself! Leveraging Language Models for Commonsense Reasoning , 2019, ACL.

[70]  Richard Socher,et al.  Unifying Question Answering and Text Classification via Span Extraction , 2019, ArXiv.

[71]  Ali Farhadi,et al.  Defending Against Neural Fake News , 2019, NeurIPS.

[72]  Richard Socher,et al.  Neural Text Summarization: A Critical Evaluation , 2019, EMNLP.

[73]  Ludovic Denoyer,et al.  Unsupervised Question Answering by Cloze Translation , 2019, ACL.

[74]  Yoav Goldberg,et al.  Filling Gender & Number Gaps in Neural Machine Translation with Black-box Context Injection , 2019, Proceedings of the First Workshop on Gender Bias in Natural Language Processing.

[75]  Guillaume Lample,et al.  Large Memory Layers with Product Keys , 2019, NeurIPS.

[76]  Sylvain Lamprier,et al.  Answers Unite! Unsupervised Metrics for Reinforced Summarization Models , 2019, EMNLP.

[77]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[78]  Boris Ginsburg,et al.  Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks , 2019, ArXiv.

[79]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[80]  Yoram Singer,et al.  Memory Efficient Adaptive Optimization , 2019, NeurIPS.

[81]  Marta R. Costa-jussà,et al.  Findings of the 2019 Conference on Machine Translation (WMT19) , 2019, WMT.

[82]  Jason Weston,et al.  ELI5: Long Form Question Answering , 2019, ACL.

[83]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[84]  Jason Weston,et al.  Neural Text Generation with Unlikelihood Training , 2019, ICLR.

[85]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.