Gesture recognition of traffic police based on static and dynamic descriptor fusion

We present a method to recognize gestures made by Chinese traffic police based on the static and dynamic descriptor fusion for driver assistance systems and intelligent vehicles. Gesture recognition is made possible by combining the extracted static and dynamic features. First, the point cloud data of human upper body in each frame of input video is obtained to estimate the static descriptor with 2.5D gesture model. Then, the dynamic descriptor is estimated by computing the motion history image of the input RGB video sequence. Finally, the above two descriptors are fused and the mean structural similarity index is used to recognize the gestures made by Chinese traffic police. A comparative study and qualitative evaluation are proposed with other gesture recognition methods, which demonstrate that better recognition results can be obtained using the proposed method on a number of video sequences.

[1]  Anup Basu,et al.  Visual gesture recognition for ground air traffic control using the Radon transform , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Andrew Zisserman,et al.  2D Articulated Human Pose Estimation and Retrieval in (Almost) Unconstrained Still Images , 2012, International Journal of Computer Vision.

[3]  Gary R. Bradski,et al.  Motion segmentation and pose recognition with motion history gradients , 2002, Machine Vision and Applications.

[4]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[5]  Javier Ruiz Hidalgo,et al.  Real-Time Head and Hand Tracking Based on 2.5D Data , 2011, IEEE Transactions on Multimedia.

[6]  Zixing Cai,et al.  Max-covering scheme for gesture recognition of Chinese traffic police , 2014, Pattern Analysis and Applications.

[7]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Vittorio Ferrari,et al.  Human Pose Co-Estimation and Applications , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Beiji Zou,et al.  Automatic reconstruction of 3D human motion pose from uncalibrated monocular video sequences based on markerless human motion tracking , 2009, Pattern Recognition.

[10]  Tomás Pajdla,et al.  3D with Kinect , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[11]  Keechul Jung,et al.  Recognition-based gesture spotting in video games , 2004, Pattern Recognit. Lett..

[12]  Zixing Cai,et al.  Automatic Recognition of Chinese Traffic Police Gesture Based on Max-Covering Scheme , 2013 .

[13]  Yale Song,et al.  Tracking body and hands for gesture recognition: NATOPS aircraft handling signals database , 2011, Face and Gesture 2011.

[14]  Yan Gao,et al.  2.5D Multi-View Gait Recognition Based on Point Cloud Registration , 2014, Sensors.

[15]  Zixing Cai,et al.  Chinese Traffic Police Gesture Recognition in Complex Scene , 2011, 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications.

[16]  Thanh Ha Le,et al.  Road Traffic Control Gesture Recognition using Depth Images , 2012 .

[17]  Vittorio Ferrari,et al.  Better Appearance Models for Pictorial Structures , 2009, BMVC.

[18]  Zhen Zhou,et al.  Extreme Learning Machine Based Hand Posture Recognition in Color-Depth Image , 2014, CCPR.

[19]  Andrew Zisserman,et al.  Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Kikuo Fujimura,et al.  A Bayesian Framework for Human Body Pose Tracking from Depth Image Sequences , 2010, Sensors.

[21]  Ben Taskar,et al.  Adaptive pose priors for pictorial structures , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Yongming Huang,et al.  Improved Emotion Recognition with Novel Global Utterance-level Features , 2011 .

[23]  Mark Everingham,et al.  Learning effective human pose estimation from inaccurate annotation , 2011, CVPR 2011.