Accurate Estimation of Human Body Orientation From RGB-D Sensors

Accurate estimation of human body orientation can significantly enhance the analysis of human behavior, which is a fundamental task in the field of computer vision. However, existing orientation estimation methods cannot handle the various body poses and appearances. In this paper, we propose an innovative RGB-D-based orientation estimation method to address these challenges. By utilizing the RGB-D information, which can be real time acquired by RGB-D sensors, our method is robust to cluttered environment, illumination change and partial occlusions. Specifically, efficient static and motion cue extraction methods are proposed based on the RGB-D superpixels to reduce the noise of depth data. Since it is hard to discriminate all the 360 ° orientation using static cues or motion cues independently, we propose to utilize a dynamic Bayesian network system (DBNS) to effectively employ the complementary nature of both static and motion cues. In order to verify our proposed method, we build a RGB-D-based human body orientation dataset that covers a wide diversity of poses and appearances. Our intensive experimental evaluations on this dataset demonstrate the effectiveness and efficiency of the proposed method.

[1]  Ming-Ching Chang,et al.  Gaze and body pose estimation from a distance , 2011, 2011 8th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[2]  Meng Wang,et al.  Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation , 2009, IEEE Transactions on Multimedia.

[3]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[4]  Sven J. Dickinson,et al.  TurboPixels: Fast Superpixels Using Geometric Flows , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Yue Gao,et al.  View-Based Discriminative Probabilistic Modeling for 3D Object Retrieval and Recognition , 2013, IEEE Transactions on Image Processing.

[6]  Richard Bowden,et al.  Go with the Flow: Hand Trajectories in 3D via Clustered Scene Flow , 2012, ICIAR.

[7]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Jean-Marc Odobez,et al.  We are not contortionists: Coupled adaptive learning for head and body orientation estimation in surveillance video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  M.M. Trivedi,et al.  Image based estimation of pedestrian orientation for improving path prediction , 2008, 2008 IEEE Intelligent Vehicles Symposium.

[10]  Gang Qian,et al.  Binocular dance pose recognition and body orientation estimation via multilinear analysis , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[11]  Richard Bowden,et al.  Kinecting the dots: Particle based scene flow from depth sensors , 2011, 2011 International Conference on Computer Vision.

[12]  Dieter Fox,et al.  Detection-based object labeling in 3D scenes , 2012, 2012 IEEE International Conference on Robotics and Automation.

[13]  Rainer Stiefelhagen,et al.  Multi-view Based Estimation of Human Upper-Body Orientation , 2010, 2010 20th International Conference on Pattern Recognition.

[14]  Kiyoharu Aizawa,et al.  Tracking of humans and estimation of body/head orientation from top-view single camera for visual focus of attention analysis , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[15]  Subhransu Maji,et al.  Action recognition from a distributed representation of pose and appearance , 2011, CVPR 2011.

[16]  Christoph Mertz,et al.  Pedestrian Detection and Tracking Using Three-dimensional LADAR Data , 2010, Int. J. Robotics Res..

[17]  Alexandre Heili,et al.  A joint estimation of head and body orientation cues in surveillance video , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[18]  Bernt Schiele,et al.  Monocular 3D pose estimation and tracking by detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Matteo Munaro,et al.  Tracking people within groups with RGB-D data , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  F. Meyer,et al.  Color image segmentation , 1992 .

[21]  Andrew W. Fitzgibbon,et al.  The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[23]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[24]  Luc Van Gool,et al.  Real Time Head Pose Estimation from Consumer Depth Cameras , 2011, DAGM-Symposium.

[25]  Gary R. Bradski,et al.  Fast 3D recognition and pose using the Viewpoint Feature Histogram , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26]  Ling Shao,et al.  Enhanced Computer Vision With Microsoft Kinect Sensor: A Review , 2013, IEEE Transactions on Cybernetics.

[27]  Dariu Gavrila,et al.  Integrated pedestrian classification and orientation estimation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Kai Oliver Arras,et al.  People detection in RGB-D data , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[29]  Min Sun,et al.  Conditional regression forests for human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Roland Siegwart,et al.  A Layered Approach to People Detection in 3D Range Data , 2010, AAAI.

[31]  Ian D. Reid,et al.  Estimating Gaze Direction from Low-Resolution Faces in Video , 2006, ECCV.

[32]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[33]  Patrick Pérez,et al.  Color-Based Probabilistic Tracking , 2002, ECCV.

[34]  Andrew W. Fitzgibbon,et al.  Efficient regression of general-activity human poses from depth images , 2011, 2011 International Conference on Computer Vision.

[35]  Ji Wan,et al.  RGB-D Based Multi-attribute People Search in Intelligent Visual Surveillance , 2012, MMM.

[36]  Huchuan Lu,et al.  Superpixel tracking , 2011, 2011 International Conference on Computer Vision.

[37]  David A. McAllester,et al.  Object Detection with Grammar Models , 2011, NIPS.

[38]  T. Poggio,et al.  Direction estimation of pedestrian from multiple still images , 2004, IEEE Intelligent Vehicles Symposium, 2004.

[39]  Jitendra Malik,et al.  Learning a classification model for segmentation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.