Visual Tracking of Human Head and Arms Using Adaptive Multiple Importance Sampling on a Single Camera in Cluttered Environments

This paper presents a 2D upper body tracking algorithm using a single monocular camera. The proposed method can be applied on a stationary or moving camera platform, and is able to achieve real-time performance in cluttered environments, making it ideal for human-machine interaction. The algorithm extracts body parts even when the target person approaches other objects. This is a common problem for depth-based camera systems. Real-time visual extraction of a subject's head and arms is performed during preprocessing in order to determine their current action and presents two key innovations. First, multiple visual clues are integrated dynamically by an adaptive multiple importance sampling particle filter to generate hypotheses. These hypotheses can efficiently estimate various gestures of arms on images captured from cluttered environments. Second, multiple visual cues of a human face and arms are devised, which quickly and effectively verifies various hypotheses from the multiple importance sampling schemes. To validate the effectiveness of the proposed tracking approach, several experiments are performed whose results appear to be quite promising.

[1]  Michael J. Black,et al.  Measure Locally, Reason Globally: Occlusion-sensitive Articulated Pose Estimation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[2]  Dong-Wan Kang,et al.  Postures of a human wearing a multiple-colored suit based on color information processing , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[3]  Peng Liu,et al.  2D Articulated Pose Tracking Using Particle Filter with Partitioned Sampling and Model Constraints , 2010, J. Intell. Robotic Syst..

[4]  François Brémond,et al.  Recognizing Gestures by Learning Local Motion Signatures of HOG Descriptors , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Ángel F. García-Fernández,et al.  Two-Layer Particle Filter for Multiple Target Detection and Tracking , 2013, IEEE Transactions on Aerospace and Electronic Systems.

[6]  Leonidas J. Guibas,et al.  Optimally combining sampling techniques for Monte Carlo rendering , 1995, SIGGRAPH.

[7]  Kikuo Fujimura,et al.  A Bayesian Framework for Human Body Pose Tracking from Depth Image Sequences , 2010, Sensors.

[8]  Georgiana Simion,et al.  Sparse feature for hand gesture recognition: A comparative study , 2013, 2013 36th International Conference on Telecommunications and Signal Processing (TSP).

[9]  Yunde Jia,et al.  A Real-Time 3D Human Body Tracking and Modeling System , 2006, 2006 International Conference on Image Processing.

[10]  A. Elgammal,et al.  Inferring 3D body pose from silhouettes using activity manifold learning , 2004, CVPR 2004.

[11]  Simone Calderara,et al.  Action Signature: A Novel Holistic Representation for Action Recognition , 2008, 2008 IEEE Fifth International Conference on Advanced Video and Signal Based Surveillance.

[12]  Rüdiger Dillmann,et al.  Fusion of 2d and 3d sensor data for articulated body tracking , 2009, Robotics Auton. Syst..

[13]  Gregory D. Hager,et al.  Probabilistic Data Association Methods for Tracking Complex Visual Objects , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Joo Kooi Tan,et al.  3-D Recovery of a non-rigid object from a single camera view , 2011, SICE Annual Conference 2011.

[15]  Andrew Zisserman,et al.  Long Term Arm and Hand Tracking for Continuous Sign Language TV Broadcasts , 2008, BMVC.

[16]  Andrew W. Fitzgibbon,et al.  The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Ramakant Nevatia,et al.  Human Pose Tracking in Monocular Sequence Using Multilevel Structured Models , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Pablo Bustos,et al.  Model-Based Reinforcement of Kinect Depth Data for Human Motion Capture Applications , 2013, Sensors.

[19]  Mohammed Bennamoun,et al.  A Gaussian Process Guided Particle Filter for Tracking 3D Human Pose in Video , 2013, IEEE Transactions on Image Processing.

[20]  V. Robert,et al.  Efficient real-time contour matching , 2012, 2012 IEEE 8th International Conference on Intelligent Computer Communication and Processing.

[21]  Andrew Zisserman,et al.  2D Articulated Human Pose Estimation and Retrieval in (Almost) Unconstrained Still Images , 2012, International Journal of Computer Vision.

[22]  Georgios Tziritas,et al.  Shape-Motion Based Athlete Tracking for Multilevel Action Recognition , 2006, AMDO.

[23]  Samuel R. Buss,et al.  Selectively Damped Least Squares for Inverse Kinematics , 2005, J. Graph. Tools.

[24]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[25]  H. B. Yu,et al.  A novel particle filtering algorithm based on state fusion , 2013 .

[26]  Bo Yuming,et al.  Adaptive block-fusion multiple feature tracking in a particle filter framework , 2013, 2013 IEEE International Conference on Cyber Technology in Automation, Control and Intelligent Systems.