HiGAN+: Handwriting Imitation GAN with Disentangled Representations

Humans remain far better than machines at learning, where humans require fewer examples to learn new concepts and can use those concepts in richer ways. Take handwriting as an example, after learning from very limited handwriting scripts, a person can easily imagine what the handwritten texts would like with other arbitrary textual contents (even for unseen words or texts). Moreover, humans can also hallucinate to imitate calligraphic styles from just a single reference handwriting sample (that even have never seen before). Humans can do such hallucinations, perhaps because they can learn to disentangle the textual contents and calligraphic styles from handwriting images. Inspired by this, we propose a novel handwriting imitation generative adversarial network (HiGAN+) for realistic handwritten text synthesis based on disentangled representations. The proposed HiGAN+ can achieve a precise one-shot handwriting style transfer by introducing the writer-specific auxiliary loss and contextual loss, and it also attains a good global & local consistency by refining local details of synthetic handwriting images. Extensive experiments, including human evaluations, on the benchmark dataset validate our superiority in terms of visual quality, scalability, compactness, and style transferability compared with the state-of-the-art GANs for handwritten text synthesis.

[1]  M. Villegas,et al.  Content and Style Aware Generation of Text-Line Images for Handwriting Recognition , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Ji Gan,et al.  HiGAN: Handwriting Imitation Conditioned on Arbitrary-Length Texts and Disentangled Styles , 2021, AAAI.

[3]  M. Shah,et al.  Handwriting Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Hyunjung Shim,et al.  Few-shot Font Generation with Localized Style Representations and Factorization , 2020, AAAI.

[5]  Brian L. Price,et al.  Text and Style Conditioned GAN for the Generation of Offline-Handwriting Lines , 2020, BMVC.

[6]  Stefanie Tellex,et al.  Generating Handwriting via Decoupled Style Descriptors , 2020, ECCV.

[7]  Gayoung Lee,et al.  Few-shot Compositional Font Generation with Dual Memory , 2020, ECCV.

[8]  Zhouhui Lian,et al.  Attribute2Font , 2020, ACM Trans. Graph..

[9]  Sarel Cohen,et al.  ScrabbleGAN: Semi-Supervised Varying Length Handwritten Text Generation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  M. Villegas,et al.  GANwriting: Content-Conditioned Generation of Styled Handwritten Word Images , 2020, ECCV.

[11]  Maneesh Kumar Singh,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2020, International Journal of Computer Vision.

[12]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[13]  Jianguo Xiao,et al.  Artistic glyph image synthesis via one-stage few-shot learning , 2019, ACM Trans. Graph..

[14]  Liang Wu,et al.  Editing Text in the Wild , 2019, ACM Multimedia.

[15]  Yue Jiang,et al.  SCFont: Structure-Guided Chinese Font Generation via Deep Stacked Networks , 2019, AAAI.

[16]  Siwei Ma,et al.  Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Ronaldo O. Messina,et al.  Adversarial Generation of Handwritten Text Images Conditioned on Sequences , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[18]  Jing Liao,et al.  CariGANs , 2018, ACM Trans. Graph..

[19]  Matthias Zwicker,et al.  Faceshop , 2018, ACM Trans. Graph..

[20]  Jan Kautz,et al.  Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[21]  Lihi Zelnik-Manor,et al.  The Contextual Loss for Image Transformation with Non-Aligned Data , 2018, ECCV.

[22]  Qiong Zhang,et al.  Generating Handwritten Chinese Characters Using CycleGAN , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[23]  Arthur Gretton,et al.  Demystifying MMD GANs , 2018, ICLR.

[24]  Trevor Darrell,et al.  Multi-content GAN for Few-Shot Font Style Transfer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Alexei A. Efros,et al.  Toward Multimodal Image-to-Image Translation , 2017, NIPS.

[27]  Hiroshi Ishikawa,et al.  Globally and locally consistent image completion , 2017, ACM Trans. Graph..

[28]  Hugo Larochelle,et al.  Modulating early visual processing by language , 2017, NIPS.

[29]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[30]  Douglas Eck,et al.  A Neural Representation of Sketch Drawings , 2017, ICLR.

[31]  Hyunsoo Kim,et al.  Learning to Discover Cross-Domain Relations with Generative Adversarial Networks , 2017, ICML.

[32]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Yoshua Bengio,et al.  Drawing and Recognizing Chinese Characters with Recurrent Neural Network , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  J. Schulman,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[35]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[36]  Gabriel J. Brostow,et al.  My Text in Your Handwriting , 2016, ACM Trans. Graph..

[37]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[38]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[39]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[40]  Aaron C. Courville,et al.  Generative Adversarial Nets , 2014, NIPS.

[41]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[42]  Alex Graves Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[43]  Zhouchen Lin,et al.  Style-preserving English handwriting synthesis , 2007, Pattern Recognit..

[44]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[45]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[46]  Alex W. C. Lee,et al.  GNHK: A Dataset for English Handwriting in the Wild , 2021, ICDAR.