Conditional Convolutional Neural Network for Modality-Aware Face Recognition

Faces in the wild are usually captured with various poses, illuminations and occlusions, and thus inherently multimodally distributed in many tasks. We propose a conditional Convolutional Neural Network, named as c-CNN, to handle multimodal face recognition. Different from traditional CNN that adopts fixed convolution kernels, samples in c-CNN are processed with dynamically activated sets of kernels. In particular, convolution kernels within each layer are only sparsely activated when a sample is passed through the network. For a given sample, the activations of convolution kernels in a certain layer are conditioned on its present intermediate representation and the activation status in the lower layers. The activated kernels across layers define the sample-specific adaptive routes that reveal the distribution of underlying modalities. Consequently, the proposed framework does not rely on any prior knowledge of modalities in contrast with most existing methods. To substantiate the generic framework, we introduce a special case of c-CNN via incorporating the conditional routing of the decision tree, which is evaluated with two problems of multimodality - multi-view face identification and occluded face verification. Extensive experiments demonstrate consistent improvements over the counterparts unaware of modalities.

[1]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[2]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[3]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[6]  Zhuowen Tu,et al.  Probabilistic boosting-tree: learning discriminative models for classification, recognition, and clustering , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[7]  Josef Kittler,et al.  Locally linear discriminant analysis for multimodally distributed classes for face recognition with a single model image , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Yongsheng Gao,et al.  Automatic Texture Synthesis for Face Recognition from Single Views , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[10]  Harry Shum,et al.  Face Hallucination: Theory and Practice , 2007, International Journal of Computer Vision.

[11]  Ramakant Nevatia,et al.  Cluster Boosted Tree Classifier for Multi-View, Multi-Pose Object Detection , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Sanjoy Dasgupta,et al.  Random projection trees and low dimensional manifolds , 2008, STOC.

[13]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[14]  Takeo Kanade,et al.  Multi-PIE , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[15]  D. Jacobs,et al.  Bypassing synthesis: PLS for face recognition with pose, low-resolution and sketch , 2011, CVPR 2011.

[16]  Jonghyun Choi,et al.  Robust pose invariant face recognition using coupled latent space discriminant analysis , 2012, Comput. Vis. Image Underst..

[17]  Xin Liu,et al.  Morphable Displacement Field Based Image Matching for Face Recognition across Pose , 2012, ECCV.

[18]  Amit R.Sharma,et al.  Face Photo-Sketch Synthesis and Recognition , 2012 .

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Jian Sun,et al.  Blessing of Dimensionality: High-Dimensional Feature and Its Efficient Compression for Face Verification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Gang Hua,et al.  Probabilistic Elastic Matching for Pose Variant Face Verification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Xiaogang Wang,et al.  Deep Learning Identity-Preserving Face Space , 2013, 2013 IEEE International Conference on Computer Vision.

[23]  Andrew Zisserman,et al.  Fisher Vector Faces in the Wild , 2013, BMVC.

[24]  Shiguang Shan,et al.  Stacked Progressive Auto-Encoders (SPAE) for Face Recognition Across Poses , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Xiaogang Wang,et al.  Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[28]  Peter Kontschieder,et al.  Neural Decision Forests for Semantic Image Labelling , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Wenhan Luo,et al.  Unified Face Analysis by Iterative Multi-output Random Forests , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Xiaogang Wang,et al.  Multi-View Perceptron: a Deep Model for Learning Face Identity and View Representations , 2014, NIPS.

[31]  Antonio Criminisi,et al.  Filter Forests for Learning Data-Dependent Convolutional Kernels , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Shiguang Shan,et al.  Multi-View Discriminant Analysis , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.