Simultaneous segmentation and recognition of hand gestures for human-robot interaction

Gestures are a natural form of communication between people that is being increasingly used for human-robot interaction. There are many automatic techniques to recognize gestures, however, most of them assume that gestures are already segmented from continuous video, clearly, this is an unrealistic scenario for human-robot interaction. For instance, when commanding a service robot the agent must be aware at any time of the world (e.g., via continuous video) and ready to react when a user gives an order (e.g., using a gesture). In this paper we propose a method for addressing both tasks, segmentation and recognition of gestures, simultaneously. The proposed method is based on a novel video-stream exploration scheme called multi-size dynamic windows. Several windows of different sizes are dynamically created, each window is classified by a Hidden Markov Model (HMM). Predictions are combined via a voting strategy and eventually the endpoint of a gesture is detected (segmentation). At that moment the method recognizes the gesture that has been just performed using a majority vote decision (recognition). The proposed method is intended to command a service robot by capturing information of user movements with a KinectTM sensor. We evaluated experimentally the proposed method with 5 different gestures suitable for commanding a service robot. Experimental results show that up to 82.76% of the gestures are correctly segmented. The corresponding recognition performance was of 89.58 %. We consider that this performance is acceptable for certain human-robot interaction scenarios.

[1]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[2]  Hang Joon Kim,et al.  HMM-Based Gesture Recognition for Robot Control , 2005, IbPRIA.

[3]  Seong-Whan Lee,et al.  Gesture Spotting and Recognition for Human–Robot Interaction , 2007, IEEE Transactions on Robotics.

[4]  Jarrett Webb,et al.  Beginning Kinect Programming with the Microsoft Kinect SDK , 2012, Apress.

[5]  L. Enrique Sucar,et al.  Markovito: A Flexible and General Service Robot , 2009 .

[6]  Venu Govindaraju,et al.  A temporal Bayesian model for classifying, detecting and localizing activities in video sequences , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[7]  S. Mitra,et al.  Gesture Recognition: A Survey , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[8]  Richard P. Wildes,et al.  Efficient action spotting based on a spacetime oriented structure representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Javier Ruiz-del-Solar,et al.  Real-Time Hand Gesture Recognition for Human Robot Interaction , 2009, RoboCup.

[10]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[11]  Daijin Kim,et al.  Simultaneous Gesture Segmentation and Recognition based on Forward Spotting Accumulative HMMs , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[12]  H. H. Avilés-Arriaga,et al.  A Comparison of Dynamic Naive Bayesian Classifiers and Hidden , 2011 .

[13]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[14]  Ying Wu,et al.  Discriminative subvolume search for efficient action detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.