Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning

Guided sampling is a vital approach for applying diffusion models in real-world tasks that embeds human-defined guidance during the sampling procedure. This paper considers a general setting where the guidance is defined by an (unnormalized) energy function. The main challenge for this setting is that the intermediate guidance during the diffusion sampling procedure, which is jointly defined by the sampling distribution and the energy function, is unknown and is hard to estimate. To address this challenge, we propose an exact formulation of the intermediate guidance as well as a novel training objective named contrastive energy prediction (CEP) to learn the exact guidance. Our method is guaranteed to converge to the exact guidance under unlimited model capacity and data samples, while previous methods can not. We demonstrate the effectiveness of our method by applying it to offline reinforcement learning (RL). Extensive experiments on D4RL benchmarks demonstrate that our method outperforms existing state-of-the-art algorithms. We also provide some examples of applying CEP for image synthesis to demonstrate the scalability of CEP on high-dimensional data.

[1]  Sergio Valcarcel Macua,et al.  Imitating Human Behaviour with Diffusion Models , 2023, ICLR.

[2]  Fang Wen,et al.  RODIN: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  J. Tenenbaum,et al.  Is Conditional Generative Modeling all you need for Decision-Making? , 2022, ICLR.

[4]  Jiashi Feng,et al.  MagicVideo: Efficient Video Generation With Latent Diffusion Models , 2022, ArXiv.

[5]  Cheng Lu,et al.  DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models , 2022, ArXiv.

[6]  S. Fidler,et al.  LION: Latent Point Diffusion Models for 3D Shape Generation , 2022, NeurIPS.

[7]  Diederik P. Kingma,et al.  On Distillation of Guided Diffusion Models , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  David J. Fleet,et al.  Imagen Video: High Definition Video Generation with Diffusion Models , 2022, ArXiv.

[9]  Zhongkai Hao,et al.  Equivariant Energy-Guided SDE for Inverse Molecular Design , 2022, ICLR.

[10]  Jong-Chul Ye,et al.  Diffusion-based Image Translation using Disentangled Style and Content Representation , 2022, ICLR.

[11]  Hang Su,et al.  Offline Reinforcement Learning via High-Fidelity Generative Behavior Modeling , 2022, ICLR.

[12]  Michael T. McCann,et al.  Diffusion Posterior Sampling for General Noisy Inverse Problems , 2022, ICLR.

[13]  Ben Poole,et al.  DreamFusion: Text-to-3D using 2D Diffusion , 2022, ICLR.

[14]  Mao Ye,et al.  Diffusion-based Molecule Generation with Informative Prior Bridges , 2022, NeurIPS.

[15]  Xiaodong Liu,et al.  Deep Generative Modeling on Limited Data with Regularization by Nontransferable Pre-trained Models , 2022, ICLR.

[16]  Jonathan J. Hunt,et al.  Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning , 2022, ICLR.

[17]  Jonathan Ho Classifier-Free Diffusion Guidance , 2022, ArXiv.

[18]  Chongxuan Li,et al.  EGSDE: Unpaired Image-to-Image Translation via Energy-Guided Stochastic Differential Equations , 2022, NeurIPS.

[19]  D. Samaras,et al.  Diffusion models as plug-and-play priors , 2022, NeurIPS.

[20]  Tim Salimans,et al.  Lossy Compression with Gaussian Diffusion , 2022, ArXiv.

[21]  Cheng Lu,et al.  Maximum Likelihood Training for Score-Based Diffusion ODEs by High-Order Denoising Score Matching , 2022, ICML.

[22]  Yongxin Chen,et al.  gDDIM: Generalized denoising diffusion implicit models , 2022, ICLR.

[23]  Cheng Lu,et al.  DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps , 2022, NeurIPS.

[24]  Tero Karras,et al.  Elucidating the Design Space of Diffusion-Based Generative Models , 2022, NeurIPS.

[25]  S. Niekum,et al.  Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in Offline RL , 2022, arXiv.org.

[26]  Xiang Lisa Li,et al.  Diffusion-LM Improves Controllable Text Generation , 2022, NeurIPS.

[27]  David J. Fleet,et al.  Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.

[28]  S. Levine,et al.  Planning with Diffusion for Flexible Behavior Synthesis , 2022, ICML.

[29]  Yongxin Chen,et al.  Fast Sampling of Diffusion Models with Exponential Integrator , 2022, ICLR.

[30]  Prafulla Dhariwal,et al.  Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.

[31]  David J. Fleet,et al.  Video Diffusion Models , 2022, NeurIPS.

[32]  Victor Garcia Satorras,et al.  Equivariant Diffusion for Molecule Generation in 3D , 2022, ICML.

[33]  S. Mandt,et al.  Diffusion Probabilistic Modeling for Video Generation , 2022, Entropy.

[34]  S. Ermon,et al.  GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation , 2022, ICLR.

[35]  Tim Salimans,et al.  Progressive Distillation for Fast Sampling of Diffusion Models , 2022, ICLR.

[36]  Michael Elad,et al.  Denoising Diffusion Restoration Models , 2022, NeurIPS.

[37]  Bo Zhang,et al.  Analytic-DPM: an Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models , 2022, ICLR.

[38]  Piyush Rai,et al.  DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from Low-Dimensional Latents , 2022, Trans. Mach. Learn. Res..

[39]  Prafulla Dhariwal,et al.  GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models , 2021, ICML.

[40]  B. Ommer,et al.  High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Fang Wen,et al.  Vector Quantized Diffusion Model for Text-to-Image Synthesis , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  David J. Fleet,et al.  Palette: Image-to-Image Diffusion Models , 2021, SIGGRAPH.

[43]  Anima Anandkumar,et al.  Controllable and Compositional Generation with Latent-Space Energy-Based Models , 2021, NeurIPS.

[44]  Sergey Levine,et al.  Offline Reinforcement Learning with Implicit Q-Learning , 2021, ICLR.

[45]  Diederik P. Kingma,et al.  Variational Diffusion Models , 2021, ArXiv.

[46]  Heiga Zen,et al.  WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis , 2021, Interspeech.

[47]  David J. Fleet,et al.  Cascaded Diffusion Models for High Fidelity Image Generation , 2021, J. Mach. Learn. Res..

[48]  Prafulla Dhariwal,et al.  Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[49]  Zhou Zhao,et al.  DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism , 2021, AAAI.

[50]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[51]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[52]  Jiaming Song,et al.  Denoising Diffusion Implicit Models , 2020, ICLR.

[53]  Heiga Zen,et al.  WaveGrad: Estimating Gradients for Waveform Generation , 2020, ICLR.

[54]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[55]  S. Levine,et al.  Accelerating Online Reinforcement Learning with Offline Datasets , 2020, ArXiv.

[56]  Yali Amit,et al.  Exponential Tilting of Generative Models: Improving Sample Quality by Training and Sampling from Latent Energy , 2020, ArXiv.

[57]  S. Levine,et al.  Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.

[58]  Justin Fu,et al.  D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.

[59]  Sergey Levine,et al.  Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning , 2019, ArXiv.

[60]  Tom B. Brown,et al.  Fine-Tuning Language Models from Human Preferences , 2019, ArXiv.

[61]  Sergey Levine,et al.  Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.

[62]  Doina Precup,et al.  Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.

[63]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[64]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[65]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[66]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[67]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[68]  Ning Chen,et al.  Bayesian inference with posterior regularization and applications to infinite latent SVMs , 2012, J. Mach. Learn. Res..

[69]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[70]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[71]  Pat Langley,et al.  Crafting Papers on Machine Learning , 2000, ICML.

[72]  Stefano Ermon,et al.  SDEdit: Image Synthesis and Editing with Stochastic Differential Equations , 2021, ArXiv.