SVGformer: Representation Learning for Continuous Vector Graphics using Transformers

Advances in representation learning have led to great success in understanding and generating data in various domains. However, in modeling vector graphics data, the pure data-driven approach often yields unsatisfactory results in downstream tasks as existing deep learning methods often require the quantization of SVG parameters and cannot exploit the geometric properties explicitly. In this paper, we propose a transformer-based representation learning model (SVG-former) that directly operates on continuous input values and manipulates the geometric information of SVG to encode outline details and long-distance dependencies. SVGfomer can be used for various downstream tasks: reconstruction, classification, interpolation, retrieval, etc. We have conducted extensive experiments on vector font and icon datasets to show that our model can capture high-quality representation information and outperform the previous state-of-the-art on downstream tasks significantly.

[1]  Yan Liu,et al.  Estimating Treatment Effects from Irregular Time Series Observations with Hidden Confounders , 2023, AAAI.

[2]  Y. Liu,et al.  Counterfactual Neural Temporal Point Process for Estimating Causal Influence of Misinformation on Social Media , 2022, NeurIPS.

[3]  David J. Fleet,et al.  Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.

[4]  Prafulla Dhariwal,et al.  Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.

[5]  Zhouhui Lian,et al.  DeepVecFont , 2021, ACM Trans. Graph..

[6]  Jeffrey S. Bowers,et al.  Convolutional Neural Networks Are Not Invariant to Translation, but They Can Learn to Be , 2021, J. Mach. Learn. Res..

[7]  Hongyang Chao,et al.  Rethinking and Improving Relative Position Encoding for Vision Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Tiejun Huang,et al.  High-speed Image Reconstruction through Short-term Plasticity for Spiking Cameras , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Masayoshi Tomizuka,et al.  Spectral Temporal Graph Neural Network for Trajectory Prediction , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[10]  N. Mitra,et al.  Im2Vec: Synthesizing Vector Graphics without Vector Supervision , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Hui Xiong,et al.  Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting , 2020, AAAI.

[12]  Yanjun Qi,et al.  General Multi-label Image Classification with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Qi Zhang,et al.  Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting , 2020, NeurIPS.

[14]  Alexandre Alahi,et al.  DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation , 2020, NeurIPS.

[15]  Larry S. Davis,et al.  LayoutTransformer: Layout Generation and Completion with Self-attention , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[17]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[18]  Ildoo Kim,et al.  Spatially Attentive Output Layer for Image Classification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  A. Yuille,et al.  Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation , 2020, ECCV.

[20]  Cho-Jui Hsieh,et al.  Learning to Encode Position for Transformer with Continuous Dynamical Model , 2020, ICML.

[21]  John Collomosse,et al.  Sketchformer: Transformer-Based Representation for Sketched Structure , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Xue Ben,et al.  Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case , 2020, ArXiv.

[23]  Hao Zhang,et al.  PQ-NET: A Generative Part Seq2Seq Network for 3D Shapes , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Martin Jaggi,et al.  On the Relationship between Self-Attention and Convolutional Layers , 2019, ICLR.

[25]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[26]  Yao-Hung Hubert Tsai,et al.  Transformer Dissection: An Unified Understanding for Transformer’s Attention via the Lens of Kernel , 2019, EMNLP.

[27]  Ashish Vaswani,et al.  Stand-Alone Self-Attention in Vision Models , 2019, NeurIPS.

[28]  Qi Tian,et al.  SkeletonNet: A Hybrid Network With a Skeleton-Embedding Process for Multi-View Image Representation Learning , 2019, IEEE Transactions on Multimedia.

[29]  Douglas Eck,et al.  A Learned Representation for Scalable Vector Graphics , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Zhidong Deng,et al.  Recent progress in semantic image segmentation , 2018, Artificial Intelligence Review.

[31]  Dustin Tran,et al.  Image Transformer , 2018, ICML.

[32]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Ersin Yumer,et al.  3D-PRNN: Generating Shape Primitives with Recurrent Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[35]  Thomas A. Funkhouser,et al.  Dilated Residual Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[37]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[38]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[39]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[40]  Jan Kautz,et al.  Learning a manifold of fonts , 2014, ACM Trans. Graph..

[41]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[42]  Language Understanding , 2021, Encyclopedia of Autism Spectrum Disorders.

[43]  W. Walthen-Dunn A Transformation for Extracting New De scriptors of Shape ' , in , 2017 .