Deep Cross Modal Learning for Caricature Verification and Identification (CaVINet)

Learning from different modalities is a challenging task. In this paper, we look at the challenging problem of cross modal face verification and recognition between caricature and visual image modalities. Caricature have exaggerations of facial features of a person. Due to the significant variations in the caricatures, building vision models for recognizing and verifying data from this modality is an extremely challenging task. Visual images with significantly lesser amount of distortions can act as a bridge for the analysis of caricature modality. We introduce a publicly available large Caricature-VIsual dataset [CaVI] with images from both the modalities that captures the rich variations in the caricature of an identity. This paper presents the first cross modal architecture that handles extreme distortions of caricatures using a deep learning network that learns similar representations across the modalities. We use two convolutional networks along with transformations that are subjected to orthogonality constraints to capture the shared and modality specific representations. In contrast to prior research, our approach neither depends on manually extracted facial landmarks for learning the representations, nor on the identities of the person for performing verification. The learned shared representation achieves 91% accuracy for verifying unseen images and 75% accuracy on unseen identities. Further, recognizing the identity in the image by knowledge transfer using a combination of shared and modality specific representations, resulted in an unprecedented performance of 85% rank-1 accuracy for caricatures and 95% rank-1 accuracy for visual images.

[1]  Jun Luo,et al.  Person-Specific SIFT Features for Face Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  Alexander Mordvintsev,et al.  Inceptionism: Going Deeper into Neural Networks , 2015 .

[3]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[4]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Yinghuan Shi,et al.  Ensemble of Sparse Cross-Modal Metrics for Heterogeneous Face Recognition , 2016, ACM Multimedia.

[7]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[8]  Feiping Nie,et al.  Multiview Feature Analysis via Structured Sparsity and Shared Subspace Discovery , 2017, Neural Computation.

[9]  Tal Hassner,et al.  Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.

[10]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[11]  Yinghuan Shi,et al.  WebCaricature: a benchmark for caricature face recognition , 2017, ArXiv.

[12]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[13]  Mahadev Satyanarayanan,et al.  OpenFace: A general-purpose face recognition library with mobile applications , 2016 .

[14]  Xueming Li,et al.  Cross-Modal Face Matching: Beyond Viewed Sketches , 2014, ACCV.

[15]  Anil K. Jain,et al.  Towards automated caricature recognition , 2012, 2012 5th IAPR International Conference on Biometrics (ICB).

[16]  Jinhui Tang,et al.  Weakly-Shared Deep Transfer Networks for Heterogeneous-Domain Knowledge Propagation , 2015, ACM Multimedia.

[17]  Tayfun Akgül,et al.  Matching caricatures to photographs , 2015, Signal, Image and Video Processing.

[18]  C. V. Jawahar,et al.  IIIT-CFW: A Benchmark Database of Cartoon Faces in the Wild , 2016, ECCV Workshops.

[19]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  Yinghuan Shi,et al.  WebCaricature: a benchmark for caricature recognition , 2017, BMVC.

[21]  Andrew Zisserman,et al.  Face Painting: querying art with photos , 2015, BMVC.

[22]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  Anil K. Jain,et al.  Heterogeneous face recognition , 2012 .

[24]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[25]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[26]  Cong Geng,et al.  Face recognition using sift features , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[27]  Tieniu Tan,et al.  Learning Invariant Deep Representation for NIR-VIS Face Recognition , 2017, AAAI.

[28]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[29]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[30]  Anil K. Jain,et al.  Heterogeneous Face Recognition Using Kernel Prototype Similarities , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Yinghuan Shi,et al.  Variation Robust Cross-Modal Metric Learning for Caricature Recognition , 2017, ACM Multimedia.

[32]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[33]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.