Fitting distal limb segments for accurate skeletonization in human action recognition

This paper presents a novel method for detecting distal limb segments for accurate skeletonization of human limbs in visual data for human action recognition. After background subtraction, a medial axis transform algorithm is applied to the body silhouette to detect the torso and the limbs. Then, a nine-segment skeleton model is fitted to the medial axis using a line fitting algorithm. The fitting is performed independently for each limb to speed-up the fitting process, avoiding the combinatorial complexity problems. The nine-segment skeleton model is used to provide precise endpoints of the distal segments of each limb which are reduced to centroids for efficient action representation. We believe that the distal limb segments such as forearms and shins provide sufficient and compact information for human action recognition. Each limb centroid is described by its angle, with respect to the vertical body axis, to create a six-element descriptor vector to represent the position of the torso and five angles for limb segments. The nine-segment skeleton model is detected and tracked without any manual initialization. A Gaussian Mixture Model is used to represent action descriptors for several human actions. Then, maximum log-likelihood criterion is utilized to classify actions. To evaluate our approach, we used three action datasets with different resolution and the results are compared with other approaches. As a result, a maximum average recognition rate of 98% is achieved for high resolution dataset and a minimum 90% for low resolution dataset.

[1]  Jake K. Aggarwal,et al.  Detection of Fence Climbing from Monocular Video , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[2]  Jake K. Aggarwal,et al.  Simultaneous tracking of multiple body parts of interacting persons , 2006, Comput. Vis. Image Underst..

[3]  Roland T. Chin,et al.  Analysis of Thinning Algorithms Using Mathematical Morphology , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Mubarak Shah,et al.  Chaotic Invariants for Human Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[5]  Peter H. N. de With,et al.  Flexible Human Behavior Analysis Framework for Video Surveillance Applications , 2010, Int. J. Digit. Multim. Broadcast..

[6]  Mohammad H. Mahoor,et al.  2D Human Skeleton Model from Monocular Video for Human Activity Recognition , 2010, IPCV.

[7]  Gang Xu,et al.  Tracking Human Body Motion Based on a Stick Figure Model , 1994, J. Vis. Commun. Image Represent..

[8]  Jake K. Aggarwal,et al.  Human action recognition with extremities as semantic posture representation , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[9]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[10]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[11]  Feng Niu,et al.  View-invariant human activity recognition based on shape and motion features , 2004, IEEE Sixth International Symposium on Multimedia Software Engineering.

[12]  Abdesselam Bouzerdoum,et al.  Skin segmentation using color pixel classification: analysis and comparison , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Jenq-Neng Hwang,et al.  Automatic Human Body Tracking and Modeling from Monocular Video Sequences , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[15]  Alexander G. Hauptmann,et al.  MoSIFT: Recognizing Human Actions in Surveillance Videos , 2009 .

[16]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[17]  Narendra Ahuja,et al.  Detecting Faces in Images: A Survey , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Liang Wang,et al.  Informative Shape Representations for Human Action Recognition , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[19]  Yaser Sheikh,et al.  Matching Trajectories of Anatomical Landmarks Under Viewpoint, Anthropometric and Temporal Transforms , 2009, International Journal of Computer Vision.

[20]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[21]  Edouard Thiel,et al.  Medial axis for chamfer distances: computing look-up tables and neighbourhoods in 2D or 3D , 2002, Pattern Recognit. Lett..

[22]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[23]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[24]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[25]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Pascal Fua,et al.  Skeleton-based motion capture for robust reconstruction of human motion , 2000, Proceedings Computer Animation 2000.

[27]  Hironobu Fujiyoshi,et al.  Real-time human motion analysis by image skeletonization , 1998, Proceedings Fourth IEEE Workshop on Applications of Computer Vision. WACV'98 (Cat. No.98EX201).

[28]  Hsuan-Sheng Chen,et al.  Human action recognition using star skeleton , 2006, VSSN '06.

[29]  Lynne E. Parker,et al.  4-dimensional local spatio-temporal features for human activity recognition , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[30]  Tieniu Tan,et al.  Fusion of static and dynamic body biometrics for gait recognition , 2003, IEEE Transactions on Circuits and Systems for Video Technology.

[31]  Sungkuk Chun,et al.  3D Star Skeleton for Fast Human Posture Representation , 2008 .

[32]  Anup Basu,et al.  Human Activity Recognition Based on Silhouette Directionality , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[33]  Ming Liu,et al.  Hierarchical Space-Time Model Enabling Efficient Search for Human Actions , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[34]  Gwenn Englebienne,et al.  UvA-DARE ( Digital Academic Repository ) Activity recognition using semi-Markov models on real world smart home datasets , 2010 .

[35]  Du Tran,et al.  Human Activity Recognition with Metric Learning , 2008, ECCV.

[36]  Mohammad H. Mahoor,et al.  SIFT-Motion Estimation (SIFT-ME): A New Feature for Human Activity Recognition , 2010, IPCV.

[37]  Tae-Kyun Kim,et al.  Tensor Canonical Correlation Analysis for Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[39]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[40]  Svetha Venkatesh,et al.  Robust Recognition and Segmentation of Human Actions Using HMMs with Missing Observations , 2005, EURASIP J. Adv. Signal Process..

[41]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..