On-the-Fly Controlled Text Generation with Experts and Anti-Experts

Despite recent advances in natural language generation, it remains challenging to control attributes of generated text. We propose DEXPERTS: Decoding-time Experts, a decodingtime method for controlled text generation which combines a pretrained language model with “experts” and/or “anti-experts” in an ensemble of language models. Intuitively, under our ensemble, output tokens only get high probability if they are considered likely by the experts, and unlikely by the antiexperts. We apply DEXPERTS to language detoxification and sentiment-controlled generation, where we outperform existing controllable generation methods on both automatic and human evaluations. Our work highlights the promise of using LMs trained on text with (un)desired attributes for efficient decodingtime controlled language generation.

[1]  Hanna M. Wallach,et al.  Measurement and Fairness , 2019, FAccT.

[2]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[3]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[4]  S. Fiske,et al.  Controlling other people. The impact of power on stereotyping. , 1993, The American psychologist.

[5]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[6]  Nanyun Peng,et al.  The Woman Worked as a Babysitter: On Biases in Language Generation , 2019, EMNLP.

[7]  Noam Shazeer,et al.  Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, ArXiv.

[8]  Emily Denton,et al.  Social Biases in NLP Models as Barriers for Persons with Disabilities , 2020, ACL.

[9]  Yejin Choi,et al.  RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models , 2020, FINDINGS.

[10]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[11]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[12]  Mirella Lapata,et al.  Learning to Generate Product Reviews from Attributes , 2017, EACL.

[13]  Jason Weston,et al.  What makes a good conversation? How controllable attributes affect human judgments , 2019, NAACL.

[14]  Kris McGuffie,et al.  The Radicalization Risks of GPT-3 and Advanced Neural Language Models , 2020, ArXiv.

[15]  Louis-Philippe Morency,et al.  Affect-LM: A Neural Language Model for Customizable Affective Text Generation , 2017, ACL.

[16]  Richard Socher,et al.  GeDi: Generative Discriminator Guided Sequence Generation , 2021, EMNLP.

[17]  R. Lakoff Language and woman's place , 1973, Language in Society.

[18]  Percy Liang,et al.  Delete, Retrieve, Generate: a Simple Approach to Sentiment and Style Transfer , 2018, NAACL.

[19]  Ingmar Weber,et al.  Racial Bias in Hate Speech and Abusive Language Detection Datasets , 2019, Proceedings of the Third Workshop on Abusive Language Online.

[20]  Yejin Choi,et al.  Learning to Write with Cooperative Discriminators , 2018, ACL.

[21]  Alan W Black,et al.  Exploring Controllable Text Generation Techniques , 2020, COLING.

[22]  Doug Downey,et al.  Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.

[23]  Yejin Choi,et al.  Social Bias Frames: Reasoning about Social and Power Implications of Language , 2020, ACL.

[24]  Yoav Goldberg,et al.  Controlling Linguistic Style Aspects in Neural Language Generation , 2017, ArXiv.

[25]  Jason Weston,et al.  Neural Text Generation with Unlikelihood Training , 2019, ICLR.

[26]  Yejin Choi,et al.  The Risk of Racial Bias in Hate Speech Detection , 2019, ACL.

[27]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[28]  Lav R. Varshney,et al.  CTRL: A Conditional Transformer Language Model for Controllable Generation , 2019, ArXiv.

[29]  Jason Yosinski,et al.  Plug and Play Language Models: A Simple Approach to Controlled Text Generation , 2020, ICLR.

[30]  Adam M. Croom How to do things with slurs: Studies in the way of derogatory words , 2013 .

[31]  Lucy Vasserman,et al.  Measuring and Mitigating Unintended Bias in Text Classification , 2018, AIES.

[32]  Emily M. Bender,et al.  On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 , 2021, FAccT.

[33]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[34]  Nanyun Peng,et al.  Towards Controllable Biases in Language Generation , 2020, FINDINGS.

[35]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[36]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[37]  Akhilesh Sudhakar,et al.  “Transforming” Delete, Retrieve, Generate Approach for Controlled Text Style Transfer , 2019, EMNLP.

[38]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[39]  Y-Lan Boureau,et al.  Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset , 2018, ACL.

[40]  Yann Dauphin,et al.  Hierarchical Neural Story Generation , 2018, ACL.