Comparing gesture recognition accuracy using color and depth information

In human-computer interaction applications, gesture recognition has the potential to provide a natural way of communication between humans and machines. The technology is becoming mature enough to be widely available to the public and real-world computer vision applications start to emerge. A typical example of this trend is the gaming industry and the launch of Microsoft's new camera: the Kinect. Other domains, where gesture recognition is needed, include but are not limited to: sign language recognition, virtual reality environments and smart homes. A key challenge for such real-world applications is that they need to operate in complex scenes with cluttered backgrounds, various moving objects and possibly challenging illumination conditions. In this paper we propose a method that accommodates such challenging conditions by detecting the hands using scene depth information from the Kinect. On top of our detector we employ a dynamic programming method for recognizing gestures, namely Dynamic Time Warping (DTW). Our method is translation and scale invariant which is a desirable property for many HCI systems. We have tested the performance of our approach on a digits recognition system. All experimental datasets include hand signed digits gestures but our framework can be generalized to recognize a wider range of gestures.

[1]  Tetsunori Kobayashi,et al.  Extension of hidden Markov models to deal with multiple candidates of observations and its application to mobile-robot-oriented gesture recognition , 2002, Object recognition supported by user interaction for service robots.

[2]  James L. Crowley,et al.  Active hand tracking , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[3]  William T. Freeman Computer vision for television and games , 1999, Proceedings International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems. In Conjunction with ICCV'99 (Cat. No.PR00378).

[4]  Chung-Lin Huang,et al.  Hand gesture recognition using a real-time tracking method and hidden Markov models , 2003, Image Vis. Comput..

[5]  Stan Sclaroff,et al.  A Unified Framework for Gesture Recognition and Spatiotemporal Gesture Segmentation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  D HagerGregory,et al.  Probabilistic Data Association Methods for Tracking Complex Visual Objects , 2001 .

[7]  Stan Sclaroff,et al.  Simultaneous Localization and Recognition of Dynamic Hand Gestures , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[8]  A. Corradini,et al.  Dynamic time warping for off-line recognition of a small gesture vocabulary , 2001, Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems.

[9]  Alex Pentland,et al.  Real-time American Sign Language recognition from video using hidden Markov models , 1995 .

[10]  Michael Isard,et al.  ICONDENSATION: Unifying Low-Level and High-Level Tracking in a Stochastic Framework , 1998, ECCV.

[11]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[12]  Michael J. Black,et al.  Recognizing temporal trajectories using the condensation algorithm , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[13]  James M. Rehg,et al.  Statistical Color Models with Application to Skin Detection , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[14]  Jochen Triesch,et al.  Robotic Gesture Recognition , 1997, Gesture Workshop.

[15]  Gregory D. Hager,et al.  Probabilistic Data Association Methods for Tracking Complex Visual Objects , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Alex Pentland,et al.  Space-time gestures , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Eamonn J. Keogh,et al.  Exact indexing of dynamic time warping , 2002, Knowledge and Information Systems.