论文信息 - VecFusion: Vector Font Generation with Diffusion

VecFusion: Vector Font Generation with Diffusion

We present VecFusion, a new neural architecture that can generate vector fonts with varying topological structures and precise control point positions. Our approach is a cascaded diffusion model which consists of a raster diffusion model followed by a vector diffusion model. The raster model generates low-resolution, rasterized fonts with auxiliary control point information, capturing the global style and shape of the font, while the vector model synthesizes vector fonts conditioned on the low-resolution raster fonts from the first stage. To synthesize long and complex curves, our vector diffusion model uses a transformer architecture and a novel vector representation that enables the modeling of diverse vector geometry and the precise prediction of control points. Our experiments show that, in contrast to previous generative models for vector graphics, our new cascaded vector diffusion model generates higher quality vector fonts, with complex structures and diverse styles.

[1] Seung Wook Kim,et al. Align Your Latents: High-Resolution Video Synthesis with Latent Diffusion Models , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Timothy M. Hospedales,et al. ChiroDiff: Modelling chirographic data with Diffusion Models , 2023, ICLR.

[3] L. Yu,et al. DeepVecFont-v2: Exploiting Transformers to Synthesize Vector Fonts with Higher Quality , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Xun Huang,et al. Magic3D: High-Resolution Text-to-3D Content Creation , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Tobias Gunther,et al. A Survey of Smooth Vector Graphics: Recent Advances in Representation, Creation, Rasterization and Image Vectorization. , 2022, IEEE transactions on visualization and computer graphics.

[6] Bryan Catanzaro,et al. eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers , 2022, ArXiv.

[7] David J. Fleet,et al. Imagen Video: High Definition Video Generation with Diffusion Models , 2022, ArXiv.

[8] Chi-Wing Fu,et al. Neural Wavelet-domain Diffusion for 3D Shape Generation , 2022, SIGGRAPH Asia.

[9] Geoffrey E. Hinton,et al. Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning , 2022, ICLR.

[10] Y. Fu,et al. Towards Layer-wise Image Vectorization , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11] David J. Fleet,et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.

[12] Mingming Gong,et al. Few-Shot Font Generation by Learning Fine-Grained Local Styles , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Shenggao Zhu,et al. Look Closer to Supervise Better: One-Shot Font Generation via Component-Based Discriminator , 2022, Computer Vision and Pattern Recognition.

[14] B. Ommer,et al. High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Edward Chien,et al. Keypoint-driven line drawing vectorization via PolyVector flow , 2021, ACM Trans. Graph..

[16] Ying Shan,et al. Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[17] E. Simo-Serra,et al. General virtual sketching framework for vector line art , 2021, ACM Trans. Graph..

[18] David J. Fleet,et al. Cascaded Diffusion Models for High Fidelity Image Generation , 2021, J. Mach. Learn. Res..

[19] Prafulla Dhariwal,et al. Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[20] Hyunjung Shim,et al. Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[21] Prafulla Dhariwal,et al. Improved Denoising Diffusion Probabilistic Models , 2021, ICML.

[22] N. Mitra,et al. Im2Vec: Synthesizing Vector Graphics without Vector Supervision , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Tzu-Mao Li,et al. Differentiable vector graphics rasterization for editing and learning , 2020, ACM Trans. Graph..

[24] Alexandre Alahi,et al. DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation , 2020, NeurIPS.

[25] Pieter Abbeel,et al. Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[26] Gayoung Lee,et al. Few-shot Compositional Font Generation with Dual Memory , 2020, ECCV.

[27] Yiming Gao,et al. GAN-Based Unpaired Chinese Character Image Translation via Skeleton Transformation and Stroke Rendering , 2020, AAAI.

[28] John Collomosse,et al. Sketchformer: Transformer-Based Representation for Sketched Structure , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29] H. Larochelle. Few-Shot Learning , 2020, Transfer Learning.

[30] Jianguo Xiao,et al. Artistic glyph image synthesis via one-stage few-shot learning , 2019, ACM Trans. Graph..

[31] Yue Jiang,et al. SCFont: Structure-Guided Chinese Font Generation via Deep Stacked Networks , 2019, AAAI.

[32] Douglas Eck,et al. A Learned Representation for Scalable Vector Graphics , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33] Bo Zhao,et al. EasyFont , 2018, ACM Trans. Graph..

[34] Markus H. Gross,et al. Semantic Segmentation for Line Drawing Vectorization Using Neural Networks , 2018, Comput. Graph. Forum.

[35] Mikhail Bessmeltsev,et al. Vectorization of Line Drawings via Polyvector Fields , 2018, ACM Trans. Graph..

[36] Trevor Darrell,et al. Multi-content GAN for Few-Shot Font Style Transfer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.

[38] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[39] Douglas Eck,et al. A Neural Representation of Sketch Drawings , 2017, ICLR.

[40] Jan Kautz,et al. Learning a manifold of fonts , 2014, ACM Trans. Graph..

[41] Takeo Igarashi,et al. Example-Based Automatic Font Generation , 2010, Smart Graphics.

[42] Da Li,et al. SketchKnitter: Vectorized Sketch Generation with Diffusion Models , 2023, ICLR.

[43] Michael I. Jordan,et al. Decision-Making with Auto-Encoding Variational Bayes , 2020, NeurIPS.

[44] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.