论文信息 - Optimal Linear Subspace Search: Learning to Construct Fast and High-Quality Schedulers for Diffusion Models - 字舞流文

Optimal Linear Subspace Search: Learning to Construct Fast and High-Quality Schedulers for Diffusion Models

In recent years, diffusion models have become the most popular and powerful methods in the field of image synthesis, even rivaling human artists in artistic creativity. However, the key issue currently limiting the application of diffusion models is its extremely slow generation process. Although several methods were proposed to speed up the generation process, there still exists a trade-off between efficiency and quality. In this paper, we first provide a detailed theoretical and empirical analysis of the generation process of the diffusion models based on schedulers. We transform the designing problem of schedulers into the determination of several parameters, and further transform the accelerated generation process into an expansion process of the linear subspace. Based on these analyses, we consequently propose a novel method called Optimal Linear Subspace Search (OLSS), which accelerates the generation process by searching for the optimal approximation process of the complete generation process in the linear subspaces spanned by latent variables. OLSS is able to generate high-quality images with a very small number of steps. To demonstrate the effectiveness of our method, we conduct extensive comparative experiments on open-source diffusion models. Experimental results show that with a given number of steps, OLSS can significantly improve the quality of generated images. Using an NVIDIA A100 GPU, we make it possible to generate a high-quality image by Stable Diffusion within only one second without other optimization techniques.

Jun Huang | Weining Qian | Chengyu Wang | Zhongjie Duan | Cen Chen

[1] Cheng Lu,et al. DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models , 2022, ArXiv.

[2] Hua Wu,et al. ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Ludwig Schmidt,et al. LAION-5B: An open large-scale dataset for training next generation image-text models , 2022, NeurIPS.

[4] Ming-Hsuan Yang,et al. Diffusion Models: A Comprehensive Survey of Methods and Applications , 2022, ACM Computing Surveys.

[5] Jonathan Ho. Classifier-Free Diffusion Guidance , 2022, ArXiv.

[6] Cheng Lu,et al. DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps , 2022, NeurIPS.

[7] Tero Karras,et al. Elucidating the Design Space of Diffusion-Based Generative Models , 2022, NeurIPS.

[8] David J. Fleet,et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.

[9] Yongxin Chen,et al. Fast Sampling of Diffusion Models with Exponential Integrator , 2022, ICLR.

[10] Prafulla Dhariwal,et al. Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.

[11] David J. Fleet,et al. Video Diffusion Models , 2022, NeurIPS.

[12] Yi Ren,et al. Pseudo Numerical Methods for Diffusion Models on Manifolds , 2022, ICLR.

[13] Mohammad Norouzi,et al. Learning Fast Samplers for Diffusion Models by Differentiating Through Sample Quality , 2022, ICLR.

[14] Tim Salimans,et al. Progressive Distillation for Fast Sampling of Diffusion Models , 2022, ICLR.

[15] B. Ommer,et al. High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Prafulla Dhariwal,et al. Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[17] Zhou Zhao,et al. DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism , 2021, AAAI.

[18] David J. Fleet,et al. Image Super-Resolution via Iterative Refinement , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[20] Prafulla Dhariwal,et al. Improved Denoising Diffusion Probabilistic Models , 2021, ICML.

[21] Jiaming Song,et al. Denoising Diffusion Implicit Models , 2020, ICLR.

[22] Pieter Abbeel,et al. Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[23] Yang Song,et al. Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[24] Jaakko Lehtinen,et al. Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[25] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[26] John E. Hopcroft,et al. Stacked Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Jonathon Shlens,et al. Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[28] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.

[29] Bernt Schiele,et al. Generative Adversarial Text to Image Synthesis , 2016, ICML.

[30] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Yinda Zhang,et al. LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[32] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[33] Surya Ganguli,et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[34] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[35] Mark A. Richards,et al. QR decomposition on GPUs , 2009, GPGPU-2.

[36] J. Butcher. Numerical methods for ordinary differential equations in the 20th century , 2000 .

[37] Phillip A. Regalia,et al. Numerical stability properties of a QR-based fast least squares algorithm , 1993, IEEE Trans. Signal Process..

[38] Xin Jiang,et al. Wukong: 100 Million Large-scale Chinese Cross-modal Pre-training Dataset and A Foundation Framework , 2022, ArXiv.

[39] Sotirios A. Tsaftaris,et al. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015 , 2015, Lecture Notes in Computer Science.

[40] Jitendra R. Raol,et al. Least squares methods , 2004 .