论文信息 - Conditional GAN with Discriminative Filter Generation for Text-to-Video Synthesis

Conditional GAN with Discriminative Filter Generation for Text-to-Video Synthesis

Developing conditional generative models for textto-video synthesis is an extremely challenging yet an important topic of research in machine learning. In this work, we address this problem by introducing Text-Filter conditioning Generative Adversarial Network (TFGAN), a conditional GAN model with a novel multi-scale text-conditioning scheme that improves text-video associations. By combining the proposed conditioning scheme with a deep GAN architecture, TFGAN generates high quality videos from text on challenging real-world video datasets. In addition, we construct a synthetic dataset of text-conditioned moving shapes to systematically evaluate our conditioning scheme. Extensive experiments demonstrate that TFGAN significantly outperforms existing approaches, and can also generate videos of novel categories not seen during training.

[1] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.

[2] Bernt Schiele,et al. Generative Adversarial Text to Image Synthesis , 2016, ICML.

[3] Luc Van Gool,et al. Dynamic Filter Networks , 2016, NIPS.

[4] Minyi Guo,et al. GraphGAN: Graph Representation Learning with Generative Adversarial Nets , 2017, AAAI.

[5] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[6] Dimitris N. Metaxas,et al. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[7] Zhe Gan,et al. AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8] Luigi Acerbi,et al. Advances in Neural Information Processing Systems 27 , 2014 .

[9] Yitong Li,et al. Video Generation From Text , 2017, AAAI.

[10] Jan Kautz,et al. MoCoGAN: Decomposing Motion and Content for Video Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11] Jonathon Shlens,et al. Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[12] Ben Calderhead,et al. Advances in Neural Information Processing Systems 29 , 2016 .

[13] Stability , 1973 .

[14] Honglak Lee,et al. Attribute2Image: Conditional Image Generation from Visual Attributes , 2015, ECCV.

[15] Dima Damen,et al. Scaling Egocentric Vision: The EPIC-KITCHENS Dataset , 2018, ArXiv.

[16] Kilian Q. Weinberger,et al. Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 , 2016 .

[17] Heng Wang,et al. Text Generation Based on Generative Adversarial Nets with Latent Variable , 2017, PAKDD.

[18] 拓海杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[19] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.