ChiroDiff: Modelling chirographic data with Diffusion Models

Generative modelling over continuous-time geometric constructs, a.k.a such as handwriting, sketches, drawings etc., have been accomplished through autoregressive distributions. Such strictly-ordered discrete factorization however falls short of capturing key properties of chirographic data -- it fails to build holistic understanding of the temporal concept due to one-way visibility (causality). Consequently, temporal data has been modelled as discrete token sequences of fixed sampling rate instead of capturing the true underlying concept. In this paper, we introduce a powerful model-class namely"Denoising Diffusion Probabilistic Models"or DDPMs for chirographic data that specifically addresses these flaws. Our model named"ChiroDiff", being non-autoregressive, learns to capture holistic concepts and therefore remains resilient to higher temporal sampling rate up to a good extent. Moreover, we show that many important downstream utilities (e.g. conditional sampling, creative mixing) can be flexibly implemented using ChiroDiff. We further show some unique use-cases like stochastic vectorization, de-noising/healing, abstraction are also possible with this model-class. We perform quantitative and qualitative evaluation of our framework on relevant datasets and found it to be better or on par with competing approaches.

[1]  Geoffrey E. Hinton,et al.  Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning , 2022, ICLR.

[2]  Jonathan Ho Classifier-Free Diffusion Guidance , 2022, ArXiv.

[3]  Victor Garcia Satorras,et al.  Equivariant Diffusion for Molecule Generation in 3D , 2022, ICML.

[4]  Max W. Y. Lam,et al.  BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis , 2022, ICLR.

[5]  S. Ermon,et al.  GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation , 2022, ICLR.

[6]  Yi Ren,et al.  Pseudo Numerical Methods for Diffusion Models on Manifolds , 2022, ICLR.

[7]  Prafulla Dhariwal,et al.  GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models , 2021, ICML.

[8]  B. Ommer,et al.  High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Youngjune Gwon,et al.  ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Kun Zhou,et al.  SketchGNN: Semantic Sketch Segmentation with Graph Neural Networks , 2021, ACM Trans. Graph..

[11]  Shitong Luo,et al.  Score-Based Point Cloud Denoising , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Stefano Ermon,et al.  CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation , 2021, NeurIPS.

[13]  Prafulla Dhariwal,et al.  Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[14]  Tao Xiang,et al.  Cloud2Curve: Generation and Vectorization of Parametric Sketches , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Shitong Luo,et al.  Diffusion Probabilistic Models for 3D Point Cloud Generation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Alec Radford,et al.  Zero-Shot Text-to-Image Generation , 2021, ICML.

[17]  Prafulla Dhariwal,et al.  Improved Denoising Diffusion Probabilistic Models , 2021, ICML.

[18]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[19]  C. Lawrence Zitnick,et al.  Creative Sketch Generation , 2020, ICLR.

[20]  Eric Luhman,et al.  Diffusion models for Handwriting Generation , 2020, ArXiv.

[21]  Jiaming Song,et al.  Denoising Diffusion Implicit Models , 2020, ICLR.

[22]  Noah Snavely,et al.  Learning Gradient Fields for Shape Generation , 2020, ECCV.

[23]  Alexandre Alahi,et al.  DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation , 2020, NeurIPS.

[24]  Tao Xiang,et al.  BézierSketch: A generative model for scalable vector sketches , 2020, ECCV.

[25]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[26]  Thomas Deselaers,et al.  CoSE: Compositional Stroke Embeddings , 2020, NeurIPS.

[27]  John Collomosse,et al.  Sketchformer: Transformer-Based Representation for Sketched Structure , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Thomas Deselaers,et al.  The DIDI dataset: Digital Ink Diagram data , 2020, ArXiv.

[29]  Tao Xiang,et al.  Goal-Driven Sequential Data Abstraction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Tao Xiang,et al.  Generalising Fine-Grained Sketch-Based Image Retrieval , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Douglas Eck,et al.  A Learned Representation for Scalable Vector Graphics , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Hanhui Li,et al.  Multi-column point-CNN for sketch segmentation , 2018, Neurocomputing.

[33]  Andrew Zisserman,et al.  Video Action Transformer Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Yee Whye Teh,et al.  Set Transformer , 2018, ICML.

[35]  Andrew M. Dai,et al.  Music Transformer: Generating Music with Long-Term Structure , 2018, ICLR.

[36]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[37]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[38]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[39]  Tao Xiang,et al.  Sketch-a-Net: A Deep Neural Network that Beats Humans , 2017, International Journal of Computer Vision.

[40]  Douglas Eck,et al.  A Neural Representation of Sketch Drawings , 2017, ICLR.

[41]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[42]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[44]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[45]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[46]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[47]  Tao Xiang,et al.  Sketch-a-Net that Beats Humans , 2015, BMVC.

[48]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[49]  Alex Graves Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[50]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[51]  Timothy M. Hospedales,et al.  SketchODE: Learning neural sketch representation in continuous time , 2022, ICLR.

[52]  Kaiyue Pang,et al.  SketchHealer: A Graph-to-Sequence Network for Recreating Partial Human Sketches , 2020, BMVC.

[53]  Fang Liu,et al.  SceneSketcher: Fine-Grained Image Retrieval with Scene Sketches , 2020, ECCV.