论文信息 - ChiroDiff: Modelling chirographic data with Diffusion Models

ChiroDiff: Modelling chirographic data with Diffusion Models

Generative modelling over continuous-time geometric constructs, a.k.a such as handwriting, sketches, drawings etc., have been accomplished through autoregressive distributions. Such strictly-ordered discrete factorization however falls short of capturing key properties of chirographic data -- it fails to build holistic understanding of the temporal concept due to one-way visibility (causality). Consequently, temporal data has been modelled as discrete token sequences of fixed sampling rate instead of capturing the true underlying concept. In this paper, we introduce a powerful model-class namely"Denoising Diffusion Probabilistic Models"or DDPMs for chirographic data that specifically addresses these flaws. Our model named"ChiroDiff", being non-autoregressive, learns to capture holistic concepts and therefore remains resilient to higher temporal sampling rate up to a good extent. Moreover, we show that many important downstream utilities (e.g. conditional sampling, creative mixing) can be flexibly implemented using ChiroDiff. We further show some unique use-cases like stochastic vectorization, de-noising/healing, abstraction are also possible with this model-class. We perform quantitative and qualitative evaluation of our framework on relevant datasets and found it to be better or on par with competing approaches.

Timothy M. Hospedales | Yongxin Yang | Ayan Das | Tao Xiang | Yi-Zhe Song

[1] Geoffrey E. Hinton,et al. Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning , 2022, ICLR.

[2] Jonathan Ho. Classifier-Free Diffusion Guidance , 2022, ArXiv.

[3] Victor Garcia Satorras,et al. Equivariant Diffusion for Molecule Generation in 3D , 2022, ICML.

[4] Max W. Y. Lam,et al. BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis , 2022, ICLR.

[5] S. Ermon,et al. GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation , 2022, ICLR.

[6] Yi Ren,et al. Pseudo Numerical Methods for Diffusion Models on Manifolds , 2022, ICLR.

[7] Prafulla Dhariwal,et al. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models , 2021, ICML.

[8] B. Ommer,et al. High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Youngjune Gwon,et al. ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[10] Kun Zhou,et al. SketchGNN: Semantic Sketch Segmentation with Graph Neural Networks , 2021, ACM Trans. Graph..

[11] Shitong Luo,et al. Score-Based Point Cloud Denoising , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12] Stefano Ermon,et al. CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation , 2021, NeurIPS.

[13] Prafulla Dhariwal,et al. Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[14] Tao Xiang,et al. Cloud2Curve: Generation and Vectorization of Parametric Sketches , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Shitong Luo,et al. Diffusion Probabilistic Models for 3D Point Cloud Generation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Alec Radford,et al. Zero-Shot Text-to-Image Generation , 2021, ICML.

[17] Prafulla Dhariwal,et al. Improved Denoising Diffusion Probabilistic Models , 2021, ICML.

[18] Abhishek Kumar,et al. Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[19] C. Lawrence Zitnick,et al. Creative Sketch Generation , 2020, ICLR.

[20] Eric Luhman,et al. Diffusion models for Handwriting Generation , 2020, ArXiv.

[21] Jiaming Song,et al. Denoising Diffusion Implicit Models , 2020, ICLR.

[22] Noah Snavely,et al. Learning Gradient Fields for Shape Generation , 2020, ECCV.

[23] Alexandre Alahi,et al. DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation , 2020, NeurIPS.

[24] Tao Xiang,et al. BézierSketch: A generative model for scalable vector sketches , 2020, ECCV.

[25] Pieter Abbeel,et al. Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[26] Thomas Deselaers,et al. CoSE: Compositional Stroke Embeddings , 2020, NeurIPS.

[27] John Collomosse,et al. Sketchformer: Transformer-Based Representation for Sketched Structure , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Thomas Deselaers,et al. The DIDI dataset: Digital Ink Diagram data , 2020, ArXiv.

[29] Tao Xiang,et al. Goal-Driven Sequential Data Abstraction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30] Tao Xiang,et al. Generalising Fine-Grained Sketch-Based Image Retrieval , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Douglas Eck,et al. A Learned Representation for Scalable Vector Graphics , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32] Hanhui Li,et al. Multi-column point-CNN for sketch segmentation , 2018, Neurocomputing.

[33] Andrew Zisserman,et al. Video Action Transformer Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Yee Whye Teh,et al. Set Transformer , 2018, ICML.

[35] Andrew M. Dai,et al. Music Transformer: Generating Music with Long-Term Structure , 2018, ICLR.

[36] David Duvenaud,et al. Neural Ordinary Differential Equations , 2018, NeurIPS.

[37] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.

[38] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[39] Tao Xiang,et al. Sketch-a-Net: A Deep Neural Network that Beats Humans , 2017, International Journal of Computer Vision.

[40] Douglas Eck,et al. A Neural Representation of Sketch Drawings , 2017, ICLR.

[41] Alexander J. Smola,et al. Deep Sets , 2017, 1703.06114.

[42] Leonidas J. Guibas,et al. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[44] Samy Bengio,et al. Generating Sentences from a Continuous Space , 2015, CoNLL.

[45] Surya Ganguli,et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[46] Nitish Srivastava,et al. Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[47] Tao Xiang,et al. Sketch-a-Net that Beats Humans , 2015, BMVC.

[48] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[49] Alex Graves. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[50] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[51] Timothy M. Hospedales,et al. SketchODE: Learning neural sketch representation in continuous time , 2022, ICLR.

[52] Kaiyue Pang,et al. SketchHealer: A Graph-to-Sequence Network for Recreating Partial Human Sketches , 2020, BMVC.

[53] Fang Liu,et al. SceneSketcher: Fine-Grained Image Retrieval with Scene Sketches , 2020, ECCV.