Look Closer to Supervise Better: One-Shot Font Generation via Component-Based Discriminator

Automatic font generation remains a challenging research issue due to the large amounts of characters with complicated structures. Typically, only a few samples can serve as the style/content reference (termed few-shot learning), which further increases the difficulty to preserve local style patterns or detailed glyph structures. We investigate the drawbacks of previous studies and find that a coarsegrained discriminator is insufficient for supervising a font generator. To this end, we propose a novel Component-Aware Module (CAM), which supervises the generator to decouple content and style at a more fine-grained level, i.e., the component level. Different from previous studies struggling to increase the complexity of generators, we aim to perform more effective supervision for a relatively simple generator to achieve its full potential, which is a brand new perspective for font generation. The whole framework achieves remarkable results by coupling component-level supervision with adversarial learning, hence we call it Component-Guided GAN, shortly CG-GAN. Extensive experiments show that our approach outperforms state-of-the-art one-shot font generation methods. Furthermore, it can be applied to handwritten word synthesis and scene text image editing, suggesting the generalization of our approach.

[1]  Aaron C. Courville,et al.  Generative Adversarial Networks , 2022, 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT).

[2]  Yoonsik Kim,et al.  SynthTIGER: Synthetic Text Image GEneratoR Towards Better Text Recognition Models , 2021, ICDAR.

[3]  Ji Gan,et al.  HiGAN: Handwriting Imitation Conditioned on Arbitrary-Length Texts and Disentangled Styles , 2021, AAAI.

[4]  M. Shah,et al.  Handwriting Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Yue Lu,et al.  DG-Font: Deformable Generative Networks for Unsupervised Font Generation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jianfei Cai,et al.  The Spatially-Correlative Loss for Various Image Translation Tasks , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Hyunjung Shim,et al.  Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Hyunjung Shim,et al.  Few-shot Font Generation with Localized Style Representations and Factorization , 2020, AAAI.

[9]  Jane Yung-jen Hsu,et al.  CalliGAN: Style and Structure-aware Chinese Calligraphy Character Generator , 2020, ArXiv.

[10]  Sanghyuk Chun,et al.  Few-shot Compositional Font Generation with Dual Memory , 2020, ECCV.

[11]  Sarel Cohen,et al.  ScrabbleGAN: Semi-Supervised Varying Length Handwritten Text Generation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jun Huang,et al.  SwapText: Image Based Texts Transfer in Scenes , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  M. Villegas,et al.  GANwriting: Content-Conditioned Generation of Styled Handwritten Word Images , 2020, ECCV.

[14]  Bernt Schiele,et al.  A U-Net Based Discriminator for Generative Adversarial Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Liang Wu,et al.  Editing Text in the Wild , 2019, ACM Multimedia.

[16]  Jaakko Lehtinen,et al.  Few-Shot Unsupervised Image-to-Image Translation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[18]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Luc Van Gool,et al.  ComboGAN: Unrestrained Scalability for Image Domain Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[20]  Hang Su,et al.  Learning to Write Stylized Chinese Characters by Reading a Handful of Examples , 2017, IJCAI.

[21]  Le Hui,et al.  Unsupervised Multi-Domain Image Translation with Domain-Specific Encoders/Decoders , 2017, 2018 24th International Conference on Pattern Recognition (ICPR).

[22]  Jianguo Xiao,et al.  DCFont: an end-to-end deep chinese font generation system , 2017, SIGGRAPH Asia Technical Briefs.

[23]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Wenbin Cai,et al.  Separating Style and Content for Generalized Style Transfer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Wenyu Liu,et al.  Auto-Encoder Guided GAN for Chinese Calligraphy Synthesis , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[26]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[27]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[28]  Ping Tan,et al.  DualGAN: Unsupervised Dual Learning for Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Serge J. Belongie,et al.  Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Hyunsoo Kim,et al.  Learning to Discover Cross-Domain Relations with Generative Adversarial Networks , 2017, ICML.

[31]  Jan Kautz,et al.  Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[32]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Ming-Yu Liu,et al.  Coupled Generative Adversarial Networks , 2016, NIPS.

[34]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[35]  Ernest Valveny,et al.  ICDAR 2015 competition on Robust Reading , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[36]  Palaiahnakote Shivakumara,et al.  A robust arbitrary text detection system for natural scene images , 2014, Expert Syst. Appl..

[37]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[38]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[39]  Palaiahnakote Shivakumara,et al.  Recognizing Text with Perspective Distortion in Natural Scenes , 2013, 2013 IEEE International Conference on Computer Vision.

[40]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[41]  Philip Emery REWRITE , 2013 .

[42]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[43]  C. V. Jawahar,et al.  Scene Text Recognition using Higher Order Language Priors , 2009, BMVC.

[44]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[45]  Lianwen Jin,et al.  RD-GAN: Few/Zero-Shot Chinese Character Style Transfer via Radical Decomposition and Rendering , 2020, ECCV.

[46]  Ya Zhang,et al.  Chinese Handwriting Imitation with Hierarchical Generative Adversarial Network , 2018, BMVC.

[47]  Digital Object Identifier (DOI) 10.1007/s10032-004-0134-3 ICDAR 2003 robust reading competitions: entries, results, and future directions , 2005 .

[48]  Faustino J. Gomez,et al.  Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks , 2022 .