Multi-manifold deep metric learning for image set classification

In this paper, we propose a multi-manifold deep metric learning (MMDML) method for image set classification, which aims to recognize an object of interest from a set of image instances captured from varying viewpoints or under varying illuminations. Motivated by the fact that manifold can be effectively used to model the nonlinearity of samples in each image set and deep learning has demonstrated superb capability to model the nonlinearity of samples, we propose a MMDML method to learn multiple sets of nonlinear transformations, one set for each object class, to nonlinearly map multiple sets of image instances into a shared feature subspace, under which the manifold margin of different class is maximized, so that both discriminative and class-specific information can be exploited, simultaneously. Our method achieves the state-of-the-art performance on five widely used datasets.

[1]  Ralph Gross,et al.  The CMU Motion of Body (MoBo) Database , 2001 .

[2]  Trevor Darrell,et al.  Face Recognition from Long-Term Observations , 2002, ECCV.

[3]  Ajmal S. Mian,et al.  Sparse approximated nearest points for image set classification , 2011, CVPR 2011.

[4]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[5]  Bernt Schiele,et al.  Analyzing appearance and contour based methods for object categorization , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[6]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.

[7]  David Zhang,et al.  From Point to Set: Extend the Learning of Distance Metrics , 2013, 2013 IEEE International Conference on Computer Vision.

[8]  Rama Chellappa,et al.  Dictionary-Based Face Recognition from Video , 2012, ECCV.

[9]  Vladimir Pavlovic,et al.  Face tracking and recognition with visual constraints in real-world videos , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Tieniu Tan,et al.  Online Appearance Model Learning for Video-Based Face Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Ruiping Wang,et al.  Manifold Discriminant Analysis , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Larry S. Davis,et al.  Covariance discriminative learning: A natural and efficient approach to image set classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Dit-Yan Yeung,et al.  Locally Linear Models on Face Appearance Manifolds with Application to Dual-Subspace Based Classification , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  Johannes Stallkamp,et al.  Video-based Face Recognition on Real-World Data , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  David J. Kriegman,et al.  Video-based face recognition using probabilistic appearance manifolds , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[17]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[18]  Gang Wang,et al.  Simultaneous Feature and Dictionary Learning for Image Set Based Face Recognition , 2014, ECCV.

[19]  Brian C. Lovell,et al.  Improved Image Set Classification via Joint Sparse Approximated Nearest Subspaces , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Dit-Yan Yeung,et al.  Learning a Deep Compact Image Representation for Visual Tracking , 2013, NIPS.

[21]  Brian C. Lovell,et al.  Graph embedding discriminant analysis on Grassmannian manifolds for improved image set matching , 2011, CVPR 2011.

[22]  Gang Wang,et al.  Discriminative multi-manifold analysis for face recognition from a single training sample per person , 2011, 2011 International Conference on Computer Vision.

[23]  Gang Wang,et al.  Image Set Classification Using Holistic Multiple Order Statistics Features and Localized Multi-kernel Metric Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Mohammed Bennamoun,et al.  Learning Non-linear Reconstruction Models for Image Set Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[26]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[27]  Lei Zhang,et al.  Face recognition based on regularized nearest points between image sets , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[28]  Wen Gao,et al.  Manifold-Manifold Distance with application to face recognition based on image set , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Liang Chen,et al.  Dual Linear Regression Based Classification for Face Cluster Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Masayuki Mukunoki,et al.  Set Based Discriminative Ranking for Recognition , 2012, ECCV.

[31]  Hakan Cevikalp,et al.  Face recognition based on image sets , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Shiguang Shan,et al.  Image sets alignment for Video-Based Face Recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Honglak Lee,et al.  Learning hierarchical representations for face verification with convolutional deep belief networks , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Josef Kittler,et al.  Learning Discriminative Canonical Correlations for Object Recognition with Image Sets , 2006, ECCV.

[35]  Andrew W. Fitzgibbon,et al.  Joint manifold distance: a new approach to appearance based clustering , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[36]  Ken-ichi Maeda,et al.  Face recognition using temporal image sequence , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[37]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[38]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[39]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[40]  Trevor Darrell,et al.  Face recognition with image sets using manifold density divergence , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).