Comparison of 2D and 3D Analysis For Automated Cued Speech Gesture Recognition

This paper deals with the problem of the automated classification of cued speech gestures. Cued speech is a specific gesture language (different from the sign language) used for communication between deaf people and other people. It uses only 8 different hand configurations. The aim of this work is to apply a simple classifier on 3 images data sets, in order to answer two main questions: is 3D data needed, and how important is the hand segmentation quality ? The first data set consists of images acquired with a single camera in a controlled light environment and a segmentation (called “2D segmentation”) based on luminance information. The second data set is acquired with a 3D camera which can produce a depth map; a segmentation (called “3D segmentation”) of the hand configurations based on the video and the depth map is performed. The third data set consists in 3D-segmented masks where the resulting hand mask is warped to compensate for hand pose variations. For the classification purposes, hand configurations are characterized by the computation of the seven Hu moment invariants. Then a supervised classification using a multi-layer perceptron is done. The performance of classification based on 2D and 3D information are compared.

[1]  A. Caplier,et al.  Automatic and Accurate Lip Tracking , 2003 .

[2]  Paul Duchnowski,et al.  Development of speechreading supplements based on automatic speech recognition , 2000, IEEE Trans. Biomed. Eng..

[3]  Dinggang Shen,et al.  Discriminative wavelet shape descriptors for recognition of 2-D patterns , 1999, Pattern Recognit..

[4]  Wolfgang L. Zagler,et al.  Computers for Handicapped Persons , 1994, Lecture Notes in Computer Science.

[5]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  B. Radig,et al.  Real-Time 3D and Color Camera , 2001 .

[7]  Samy Bengio,et al.  Torch: a modular machine learning software library , 2002 .

[8]  Ming-Kuei Hu,et al.  Visual pattern recognition by moment invariants , 1962, IRE Trans. Inf. Theory.

[9]  Bernd Deimel,et al.  Improving Hand-Gesture Recognition via Video Based Methods for the Separation of the Forearm from th , 1999 .

[10]  Z. Hammal,et al.  Eyes and eyebrows parametric models for automatic segmentation , 2004, 6th IEEE Southwest Symposium on Image Analysis and Interpretation, 2004..

[11]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[12]  Alireza Khotanzad,et al.  Invariant Image Recognition by Zernike Moments , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Roland T. Chin,et al.  On Image Analysis by the Methods of Moments , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Gary R. Bradski,et al.  Stereo based gesture recognition invariant to 3D pose and lighting , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[15]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.