Discriminative Exemplar Coding for Sign Language Recognition With Kinect

Sign language recognition is a growing research area in the field of computer vision. A challenge within it is to model various signs, varying with time resolution, visual manual appearance, and so on. In this paper, we propose a discriminative exemplar coding (DEC) approach, as well as utilizing Kinect sensor, to model various signs. The proposed DEC method can be summarized as three steps. First, a quantity of class-specific candidate exemplars are learned from sign language videos in each sign category by considering their discrimination. Then, every video of all signs is described as a set of similarities between frames within it and the candidate exemplars. Instead of simply using a heuristic distance measure, the similarities are decided by a set of exemplar-based classifiers through the multiple instance learning, in which a positive (or negative) video is treated as a positive (or negative) bag and those frames similar to the given exemplar in Euclidean space as instances. Finally, we formulate the selection of the most discriminative exemplars into a framework and simultaneously produce a sign video classifier to recognize sign. To evaluate our method, we collect an American sign language dataset, which includes approximately 2000 phrases, while each phrase is captured by Kinect sensor with color, depth, and skeleton information. Experimental results on our dataset demonstrate the feasibility and effectiveness of the proposed approach for sign language recognition.

[1]  Yang Wang,et al.  Unsupervised Discovery of Action Classes , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[2]  J. Sullivan,et al.  Action Recognition by Shape Matching to Key Frames , 2002 .

[3]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[4]  Yixin Chen,et al.  MILES: Multiple-Instance Learning via Embedded Instance Selection , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Tao Mei,et al.  Joint multi-label multi-instance learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Thad Starner,et al.  Visual Recognition of American Sign Language Using Hidden Markov Models. , 1995 .

[7]  Changsheng Xu,et al.  Inductive Robust Principal Component Analysis , 2012, IEEE Transactions on Image Processing.

[8]  Changsheng Xu,et al.  A generic framework for event detection in various video domains , 2010, ACM Multimedia.

[9]  Ling Shao,et al.  Enhanced Computer Vision With Microsoft Kinect Sensor: A Review , 2013, IEEE Transactions on Cybernetics.

[10]  Hermann Hienz,et al.  Video-based continuous sign language recognition using statistical methods , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[11]  Wen Gao,et al.  Large vocabulary sign language recognition based on hierarchical decision trees , 2003, ICMI '03.

[12]  Andrew Blake,et al.  Probabilistic tracking in a metric space , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[13]  Larry S. Davis,et al.  Learning dynamics for exemplar-based gesture recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[14]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[16]  Paul Lukowicz,et al.  Using multiple sensors for mobile sign language recognition , 2003, Seventh IEEE International Symposium on Wearable Computers, 2003. Proceedings..

[17]  A. Fathi,et al.  Human Pose Estimation using Motion Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[19]  Nazli Ikizler-Cinbis,et al.  Learning actions from the Web , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[21]  Lei Wang,et al.  In defense of soft-assignment coding , 2011, 2011 International Conference on Computer Vision.

[22]  Robert W. Lindeman,et al.  A new instrumented approach for translating American Sign Language into sound and text , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[23]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[24]  Václav Hlavác,et al.  Pose primitive based human action recognition in videos or still images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[26]  Harpreet S. Sawhney,et al.  PEET: Prototype Embedding and Embedding Transition for Matching Vehicles over Disparate Viewpoints , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Stan Sclaroff,et al.  Estimating 3D hand pose from a cluttered image , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[28]  Changsheng Xu,et al.  Boosted Exemplar Learning for Action Recognition and Annotation , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[29]  A. Enis Çetin,et al.  Silhouette-Based Method for Object Classification and Human Action Recognition in Video , 2006, ECCV Workshop on HCI.

[30]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[31]  Luc Van Gool,et al.  Action snippets: How many frames does human action recognition require? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[33]  Edmond Boyer,et al.  Action recognition using exemplar-based embedding , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.