NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis
暂无分享,去创建一个
Zhe Gan | Nan Duan | Chenfei Wu | Yuejian Fang | Lijuan Wang | Zicheng Liu | Jianfeng Wang | Jian Liang | Xiaowei Hu
[1] Jing Yu Koh,et al. Scaling Autoregressive Models for Content-Rich Text-to-Image Generation , 2022, Trans. Mach. Learn. Res..
[2] Zhe Gan,et al. GIT: A Generative Image-to-text Transformer for Vision and Language , 2022, Trans. Mach. Learn. Res..
[3] David J. Fleet,et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.
[4] Prafulla Dhariwal,et al. Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.
[5] W. Freeman,et al. MaskGIT: Masked Generative Image Transformer , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Fang Wen,et al. Vector Quantized Diffusion Model for Text-to-Image Synthesis , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Jian Liang,et al. NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion , 2021, ECCV.
[8] Jing Yu Koh,et al. Vector-quantized Image Modeling with Improved VQGAN , 2021, ICLR.
[9] Stephen Lin,et al. Video Swin Transformer , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Sergey Tulyakov,et al. InfinityGAN: Towards Infinite-Pixel Image Synthesis , 2021, ICLR.
[11] Jacek Tabor,et al. LocoGAN - Locally Convolutional GAN , 2020, Comput. Vis. Image Underst..
[12] Jingren Zhou,et al. M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis via Non-Autoregressive Generative Transformers , 2021, 2105.14211.
[13] Chang Zhou,et al. CogView: Mastering Text-to-Image Generation via Transformers , 2021, NeurIPS.
[14] Prafulla Dhariwal,et al. Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.
[15] Mohamed Elhoseiny,et al. Aligning Latent and Image Spaces to Connect the Unconnectable , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[16] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[17] Alec Radford,et al. Zero-Shot Text-to-Image Generation , 2021, ICML.
[18] B. Ommer,et al. Taming Transformers for High-Resolution Image Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Varun Jampani,et al. Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[20] Davis Liang,et al. Improve Transformer Models with Better Relative Position Embeddings , 2020, FINDINGS.
[21] Jiasen Lu,et al. X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers , 2020, EMNLP.
[22] Pieter Abbeel,et al. Denoising Diffusion Probabilistic Models , 2020, NeurIPS.
[23] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[24] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[25] Ruben Villegas,et al. High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks , 2019, NeurIPS.
[26] Jeff Donahue,et al. Adversarial Video Generation on Complex Datasets , 2019 .
[27] Sjoerd van Steenkiste,et al. Towards Accurate Generative Models of Video: A New Metric & Challenges , 2018, ArXiv.
[28] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.
[29] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[30] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.
[31] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.