ControlVAE: Controllable Variational Autoencoder

Variational Autoencoders (VAE) and their variants have been widely used in a variety of applications, such as dialog generation, image generation and disentangled representation learning. However, the existing VAE models have some limitations in different applications. For example, a VAE easily suffers from KL vanishing in language modeling and low reconstruction quality for disentangling. To address these issues, we propose a novel controllable variational autoencoder framework, ControlVAE, that combines a controller, inspired by automatic control theory, with the basic VAE to improve the performance of resulting generative models. Specifically, we design a new non-linear PI controller, a variant of the proportional-integral-derivative (PID) control, to automatically tune the hyperparameter (weight) added in the VAE objective using the output KL-divergence as feedback during model training. The framework is evaluated using three applications; namely, language modeling, disentangled representation learning, and image generation. The results show that ControlVAE can achieve better disentangling and reconstruction quality than the existing methods. For language modelling, it not only averts the KL-vanishing, but also improves the diversity of generated text. Finally, we also demonstrate that ControlVAE improves the reconstruction quality of generated images compared to the original VAE.

[1]  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Yali Amit,et al.  Generative Latent Flow , 2019 .

[3]  Stefano Ermon,et al.  InfoVAE: Balancing Learning and Inference in Variational Autoencoders , 2019, AAAI.

[4]  Yali Amit,et al.  Generative Latent Flow: A Framework for Non-adversarial Image Generation , 2019, ArXiv.

[5]  Patrick van der Smagt,et al.  Learning Hierarchical Priors in VAEs , 2019, NeurIPS.

[6]  Bernhard Schölkopf,et al.  From Variational to Deterministic Autoencoders , 2019, ICLR.

[7]  Xiaodong Liu,et al.  Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing , 2019, NAACL.

[8]  Guoyin Wang,et al.  Topic-Guided Variational Auto-Encoder for Text Generation , 2019, NAACL.

[9]  D. Wipf,et al.  Diagnosing and Enhancing VAE Models , 2019, ICLR.

[10]  Vladimir Pavlovic,et al.  Relevance Factor VAE: Learning and Identifying Disentangled Factors , 2019, ArXiv.

[11]  Eric P. Xing,et al.  Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation , 2018, ACL.

[12]  Dmitry Vetrov,et al.  Variational Autoencoder with Arbitrary Conditioning , 2018, ICLR.

[13]  Guillaume Desjardins,et al.  Understanding disentangling in β-VAE , 2018, ArXiv.

[14]  Dacheng Tao,et al.  Attention-GAN for Object Transfiguration in Wild Images , 2018, ECCV.

[15]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[16]  David Duvenaud,et al.  Isolating Sources of Disentanglement in VAEs , 2018, 1802.04942.

[17]  Lei Zheng,et al.  Texygen: A Benchmarking Platform for Text Generation Models , 2018, SIGIR.

[18]  Xu Sun,et al.  DP-GAN: Diversity-Promoting Generative Adversarial Network for Generating Informative and Diversified Text , 2018, ArXiv.

[19]  Mario Lucic,et al.  Are GANs Created Equal? A Large-Scale Study , 2017, NeurIPS.

[20]  Alexander A. Alemi,et al.  Fixing a Broken ELBO , 2017, ICML.

[21]  Alexander A. Alemi,et al.  An Information-Theoretic Analysis of Deep Latent-Variable Models , 2017, ArXiv.

[22]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[23]  Vighnesh Birodkar,et al.  Unsupervised Learning of Disentangled Representations from Video , 2017, NIPS.

[24]  Maxine Eskénazi,et al.  Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders , 2017, ACL.

[25]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Eric P. Xing,et al.  Toward Controlled Generation of Text , 2017, ICML.

[27]  Jan Kautz,et al.  Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[28]  Stefano Ermon,et al.  Towards Deeper Understanding of Variational Autoencoding Models , 2017, ArXiv.

[29]  Zhiting Hu,et al.  Improved Variational Autoencoders for Text Modeling using Dilated Convolutions , 2017, ICML.

[30]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[31]  Quanmin Zhu,et al.  Complex System Modelling and Control Through Intelligent Soft Computations , 2016, Studies in Fuzziness and Soft Computing.

[32]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[33]  Honglak Lee,et al.  Attribute2Image: Conditional Image Generation from Visual Attributes , 2015, ECCV.

[34]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[35]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[36]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[38]  Diederik P. Kingma,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[39]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Yixin Diao,et al.  Feedback Control of Computing Systems , 2004 .

[41]  Pat Langley,et al.  Crafting Papers on Machine Learning , 2000, ICML.

[42]  Raymond Hanus,et al.  Anti-windup, bumpless, and conditioned transfer techniques for PID controllers , 1996 .

[43]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[44]  F. Jelinek,et al.  Perplexity—a measure of the difficulty of speech recognition tasks , 1977 .

[45]  Ahmad Taher Azar,et al.  Design and Modeling of Anti Wind Up PID Controllers , 2015, Complex System Modelling and Control Through Intelligent Soft Computations.