Hand Modeling and Tracking for Video-Based Sign Language Recognition by Robust Principal Component Analysis

Hand modeling and tracking are essential in video-based sign language recognition. The high reformability and the large number of degrees of freedom of hands render the problem difficult. To tackle these challenges, a novel approach based on robust principal component analysis (PCA) is proposed. The robust PCA incorporates an L1 norm objective function to deal with background clutter, and a projection pursuit strategy to deal with the lack of alignment due to the deformation of hands. The learning algorithm of the robust PCA is very simple, involving only a search for the solutions in a finite set constructed from the training data, which leads to the learning of much more representative and interpretable bases. The incorporation of the L1 regularization in the fitting of the learned robust PCA models results in cleaner reconstructions and more stable fitting. Based on the robust PCA, a hand tracking system is developed that contains a skin-color region segmentation based on graph cuts and template matching in the framework of particle filtering. Experiments on a publicly available sign-language video database demonstrates the strength of the method.

[1]  Chris H. Q. Ding,et al.  R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization , 2006, ICML.

[2]  Andrew Zisserman,et al.  Long Term Arm and Hand Tracking for Continuous Sign Language TV Broadcasts , 2008, BMVC.

[3]  Alex Pentland,et al.  Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Michael J. Black,et al.  A Framework for Robust Subspace Learning , 2003, International Journal of Computer Vision.

[5]  Nojun Kwak,et al.  Principal Component Analysis Based on L1-Norm Maximization , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[7]  Alistair Sutherland,et al.  Transformation Invariance in Hand Shape Recognition , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[8]  A. Doucet,et al.  Maximum a Posteriori Sequence Estimation Using Monte Carlo Particle Filters , 2001, Annals of the Institute of Statistical Mathematics.

[9]  Hermann Ney,et al.  Speech recognition techniques for a sign language recognition system , 2007, INTERSPEECH.

[10]  Wu-Chih Hu,et al.  Vision-Based Hand Gesture Recognition Using PCA+Gabor Filters and SVM , 2009, 2009 Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing.

[11]  Andrew Zisserman,et al.  Learning sign language by watching TV (using weakly aligned subtitles) , 2009, CVPR.

[12]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares , 2007, IEEE Journal of Selected Topics in Signal Processing.

[13]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Marie-Pierre Jolly,et al.  Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[15]  Neil J. Gordon,et al.  Editors: Sequential Monte Carlo Methods in Practice , 2001 .

[16]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[17]  Richard Bowden,et al.  A boosted classifier tree for hand shape detection , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[18]  Andrew Zisserman,et al.  Minimal Training, Large Lexicon, Unconstrained Sign Language Recognition , 2004, BMVC.

[19]  Richard Bowden,et al.  Large Lexicon Detection of Sign Language , 2007, ICCV-HCI.

[20]  Marie-Pierre Jolly,et al.  Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in N-D Images , 2001, ICCV.

[21]  Christophe Croux,et al.  High breakdown estimators for principal components: the projection-pursuit approach revisited , 2005 .

[22]  Helen Cooper,et al.  Learning signs from subtitles: A weakly supervised approach to sign language recognition , 2009, CVPR.

[23]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .