Patch-Set-Based Representation for Alignment-Free Image Set Classification

This paper presents a patch-set-based sparse representation for image set classification. Compared with image-based image set representation, our patch-set-based representation is alignment free and thus has an advantage for tasks like video-based face recognition, image-set-based object recognition, and video-based hand gesture recognition, where precious alignment is usually difficult or even impossible due to large variance in view angle or pose. Specifically, to bypass the alignment issue, we propose to adopt the patch-based image set representation by dividing each image within each set into patches, then we cluster all the training patches into multiple clusters and classify the test patches based on the cluster centers of training patches. The labels of test patches within each cluster are inferred from a patch-set-based sparse representation for classification, and the labels of all test patches from all the clusters are then aggregated to predict a single label for the test set. Experimental results on video-based face recognition data sets (CMU-MoBo and YouTube Celebrities), image-set-based object recognition data set (ETH-80), and video-based hand gesture recognition data set (Kinect Hand Gestures) demonstrate that our proposed method consistently outperforms all existing ones, and the improvement is very significant on the YouTube Celebrities and Kinect Hand Gesture data sets. Moreover, we also quantitatively show the robustness of our method to misalignment on the Mutli-PIE data set.

[1]  Ralph Gross,et al.  The CMU Motion of Body (MoBo) Database , 2001 .

[2]  Trevor Darrell,et al.  Face Recognition from Long-Term Observations , 2002, ECCV.

[3]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[4]  Lei Zhang,et al.  Metaface learning for sparse representation based face recognition , 2010, 2010 IEEE International Conference on Image Processing.

[5]  Dit-Yan Yeung,et al.  Locally Linear Models on Face Appearance Manifolds with Application to Dual-Subspace Based Classification , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Takeo Kanade,et al.  Multi-PIE , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[7]  Brian C. Lovell,et al.  Improved Image Set Classification via Joint Sparse Approximated Nearest Subspaces , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Lei Zhang,et al.  Face recognition based on regularized nearest points between image sets , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[9]  Simon C. K. Shiu,et al.  Image Set-Based Collaborative Representation for Face Recognition , 2013, IEEE Transactions on Information Forensics and Security.

[10]  Wen Gao,et al.  Manifold-Manifold Distance with application to face recognition based on image set , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Hossein Mobahi,et al.  A Compositional Model for Low-Dimensional Image Set Representation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Rama Chellappa,et al.  Dictionary-Based Face Recognition from Video , 2012, ECCV.

[13]  Ruiping Wang,et al.  Manifold Discriminant Analysis , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Simon C. K. Shiu,et al.  Multi-scale Patch Based Collaborative Representation for Face Recognition with Margin Distribution Optimization , 2012, ECCV.

[15]  Ajmal S. Mian,et al.  Sparse approximated nearest points for image set classification , 2011, CVPR 2011.

[16]  Allen Y. Yang,et al.  Fast ℓ1-minimization algorithms and an application in robust face recognition: A review , 2010, 2010 IEEE International Conference on Image Processing.

[17]  Bernt Schiele,et al.  Analyzing appearance and contour based methods for object categorization , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[18]  Nanning Zheng,et al.  Image-Set Based Face Recognition Using Boosted Global and Local Principal Angles , 2009, ACCV.

[19]  Masashi Nishiyama,et al.  Recognizing Faces of Moving People by Hierarchical Image-Set Matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  David J. Kriegman,et al.  Clustering appearances of objects under varying illumination conditions , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[21]  Lior Wolf,et al.  Learning over Sets using Kernel Principal Angles , 2003, J. Mach. Learn. Res..

[22]  Junsong Yuan,et al.  Minimum near-convex decomposition for robust shape representation , 2011, 2011 International Conference on Computer Vision.

[23]  Rui Caseiro,et al.  Rolling Riemannian Manifolds to Solve the Multi-class Classification Problem , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Shiguang Shan,et al.  Image sets alignment for Video-Based Face Recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  David Zhang,et al.  Collaborative Representation based Classification for Face Recognition , 2012, ArXiv.

[26]  Hongdong Li,et al.  Kernel Methods on the Riemannian Manifold of Symmetric Positive Definite Matrices , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Vladimir Pavlovic,et al.  Face tracking and recognition with visual constraints in real-world videos , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Trevor Darrell,et al.  Face recognition with image sets using manifold density divergence , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[29]  Paul H. Calamai,et al.  Projected gradient methods for linearly constrained problems , 1987, Math. Program..

[30]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[31]  Mubarak Shah,et al.  Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Andrew W. Fitzgibbon,et al.  Joint manifold distance: a new approach to appearance based clustering , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[34]  Gang Hua,et al.  Probabilistic Elastic Matching for Pose Variant Face Verification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[36]  Likun Huang,et al.  Face recognition based on image sets , 2014 .

[37]  Lei Zhang,et al.  Sparse representation or collaborative representation: Which helps face recognition? , 2011, 2011 International Conference on Computer Vision.

[38]  Larry S. Davis,et al.  Covariance discriminative learning: A natural and efficient approach to image set classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Patrik O. Hoyer,et al.  Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[40]  Rama Chellappa,et al.  Video-based face recognition via joint sparse representation , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[41]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[42]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[43]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[44]  Junsong Yuan,et al.  Robust hand gesture recognition with kinect sensor , 2011, ACM Multimedia.

[45]  Josef Kittler,et al.  Discriminative Learning and Recognition of Image Set Classes Using Canonical Correlations , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.