3D head pose estimation using the Kinect

Head pose estimation plays an essential role for bridging the information gap between humans and computers. Conventional head pose estimation methods are mostly done in images captured by cameras. However accurate and robust pose estimation is often problematic. In this paper we present an algorithm for recovering the six degrees of freedom (DOF) of motion of a head from a sequence of range images taken by the Microsoft Kinect for Xbox 360. The proposed algorithm utilizes a least-squares minimization of the difference between the measured rate of change of depth at a point and the rate predicted by the depth rate constraint equation. We segment the human head from its surroundings and background, and then we estimate the head motion. Our system has the capability to recover the six DOF of the head motion of multiple people in one image. The proposed system is evaluated in our lab and presents superior results.

[1]  H. Wilson,et al.  Perception of head orientation , 2000, Vision Research.

[2]  Jian-Gang Wang,et al.  EM enhancement of 3D head pose estimated by point at infinity , 2007, Image Vis. Comput..

[3]  Ling Chen,et al.  Large head movement tracking using sift-based registration , 2007, ACM Multimedia.

[4]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[5]  Sukhendu Das,et al.  Real-Time Upper-Body Human Pose Estimation Using a Depth Camera , 2011, MIRAGE.

[6]  Ramakant Nevatia,et al.  Tracking multiple humans in complex situations , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Berthold K. P. Horn,et al.  Direct methods for recovering motion , 1988, International Journal of Computer Vision.

[8]  John G. Harris,et al.  Rigid body motion from range image sequences , 1991, CVGIP Image Underst..

[9]  Behzad Dariush,et al.  Controlled human pose estimation from depth image streams , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[10]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[11]  Shaogang Gong,et al.  Multi-view face detection and pose estimation using a composite support vector machine across the view sphere , 1999, Proceedings International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems. In Conjunction with ICCV'99 (Cat. No.PR00378).

[12]  Shaogang Gong,et al.  Composite support vector machines for detection of faces across views and pose estimation , 2002, Image Vis. Comput..

[13]  Sebastian Thrun,et al.  Real time motion capture using a single time-of-flight camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Dragomir Anguelov,et al.  Object Pose Detection in Range Scan Data , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  Ehud Rivlin,et al.  Robust 3D Head Tracking Using Camera Pose Estimation , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[16]  Stephen J. Maybank,et al.  Fusion of Multiple Tracking Algorithms for Robust People Tracking , 2002, ECCV.

[17]  Patrick Pérez,et al.  Color-Based Probabilistic Tracking , 2002, ECCV.

[18]  Y. J. Tejwani,et al.  Robot vision , 1989, IEEE International Symposium on Circuits and Systems,.