Graph Jigsaw Learning for Cartoon Face Recognition

Cartoon face recognition is challenging as they typically have smooth color regions and emphasized edges, the key to recognize cartoon faces is to precisely perceive their sparse and critical shape patterns. However, it is quite difficult to learn a shape-oriented representation for cartoon face recognition with convolutional neural networks (CNNs). To mitigate this issue, we propose the GraphJigsaw that constructs jigsaw puzzles at various stages in the classification network and solves the puzzles with the graph convolutional network (GCN) in a progressive manner. Solving the puzzles requires the model to spot the shape patterns of the cartoon faces as the texture information is quite limited. The key idea of GraphJigsaw is constructing a jigsaw puzzle by randomly shuffling the intermediate convolutional feature maps in the spatial dimension and exploiting the GCN to reason and recover the correct layout of the jigsaw fragments in a self-supervised manner. The proposed GraphJigsaw avoids training the classification model with the deconstructed images that would introduce noisy patterns and are harmful for the final classification. Specially, GraphJigsaw can be incorporated at various stages in a top-down manner within the classification model, which facilitates propagating the learned shape patterns gradually. GraphJigsaw does not rely on any extra manual annotation during the training process and incorporates no extra computation burden at inference time. Both quantitative and qualitative experimental results have verified the feasibility of our proposed GraphJigsaw, which consistently outperforms other face recognition or jigsaw-based methods on two popular cartoon face datasets with considerable improvements.

[1]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Nicu Sebe,et al.  Personalization in multimedia retrieval: A survey , 2010, Multimedia Tools and Applications.

[3]  Tao Mei,et al.  Destruction and Construction Learning for Fine-Grained Image Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Tomoyuki Nishita,et al.  FACE DETECTION AND FACE RECOGNITION OF CARTOON CHARACTERS USING FEATURE EXTRACTION , 2012 .

[5]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[6]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Wen-Huang Cheng,et al.  DAF: re: A Challenging, Crowd-Sourced, Large-Scale, Long-Tailed Dataset For Anime Character Recognition , 2021, ArXiv.

[8]  William T. Freeman,et al.  A probabilistic image jigsaw puzzle solver , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Changhu Wang,et al.  Sketch2Cartoon: composing cartoon images by sketching , 2011, MM '11.

[10]  Shuicheng Yan,et al.  Automated Assembly of Shredded Pieces From Multiple Photos , 2010, IEEE Transactions on Multimedia.

[11]  Baining Guo,et al.  3D cartoon face generation by local deformation mapping , 2016, The Visual Computer.

[12]  Harry Shum,et al.  PicToon: a personalized image-based cartoon system , 2002, MULTIMEDIA '02.

[13]  Xing Ji,et al.  CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Ira Kemelmacher-Shlizerman,et al.  The MegaFace Benchmark: 1 Million Faces for Recognition at Scale , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Omkar M. Parkhi,et al.  VGGFace2: A Dataset for Recognising Faces across Pose and Age , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[16]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[17]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Kiyoharu Aizawa,et al.  Manga109 dataset and creation of metadata , 2016, MANPU@ICPR.

[19]  King Ngi Ngan,et al.  Guided Face Cartoon Synthesis , 2011, IEEE Transactions on Multimedia.

[20]  Yuxiao Hu,et al.  MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition , 2016, ECCV.

[21]  C. V. Jawahar,et al.  IIIT-CFW: A Benchmark Database of Cartoon Faces in the Wild , 2016, ECCV Workshops.

[22]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Ayellet Tal,et al.  Solving Jigsaw Puzzles With Eroded Boundaries , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jinze Yu,et al.  Learning to Cartoonize Using White-Box Cartoon Representations , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[26]  Zhiyuan Liu,et al.  Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.

[27]  Yinghuan Shi,et al.  WebCaricature: a benchmark for caricature recognition , 2017, BMVC.

[28]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[29]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Tieniu Tan,et al.  A Light CNN for Deep Face Representation With Noisy Labels , 2015, IEEE Transactions on Information Forensics and Security.

[31]  Neil D. B. Bruce,et al.  Shape or Texture: Understanding Discriminative Features in CNNs , 2021, ICLR.

[32]  Jiwen Lu,et al.  WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Fei-Yue Wang,et al.  Learning from the Past: Meta-Continual Learning with Knowledge Embedding for Jointly Sketch, Cartoon, and Caricature Face Recognition , 2020, ACM Multimedia.

[34]  Yong Zhang,et al.  Data-driven face cartoon stylization , 2014, SIGGRAPH ASIA Technical Briefs.

[35]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[36]  Feiyue Huang,et al.  CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Xubo Yang,et al.  ToonNet: a cartoon image dataset and a DNN-based semantic classification system , 2018, VRCAI.

[38]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[39]  Yong-Jin Liu,et al.  CartoonGAN: Generative Adversarial Networks for Photo Cartoonization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Xiang Bai,et al.  Richer Convolutional Features for Edge Detection , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  David B. Cooper,et al.  Solving Square Jigsaw Puzzles with Loop Constraints , 2014, ECCV.

[42]  Shengcai Liao,et al.  Learning Face Representation from Scratch , 2014, ArXiv.

[43]  Tingyang Xu,et al.  DropEdge: Towards Deep Graph Convolutional Networks on Node Classification , 2020, ICLR.

[44]  Tao Xiang,et al.  Solving Mixed-Modal Jigsaw Puzzle for Fine-Grained Sketch-Based Image Retrieval , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Yi Zheng,et al.  Cartoon Face Recognition: A Benchmark Dataset , 2020, ACM Multimedia.

[46]  Yi-Zhe Song,et al.  Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches , 2020, ECCV.

[47]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[48]  Hongjing Lu,et al.  Deep convolutional networks do not classify based on global object shape , 2018, PLoS Comput. Biol..

[49]  Haojie Li,et al.  User-Guided Deep Anime Line Art Colorization with Conditional Adversarial Networks , 2018, ACM Multimedia.

[50]  Yusuke Matsui,et al.  Illustration2Vec: a semantic vector representation of illustrations , 2015, SIGGRAPH Asia Technical Briefs.

[51]  Ohad Ben-Shahar,et al.  A fully automated greedy square jigsaw puzzle solver , 2011, CVPR 2011.