Manifold Learning for ToF-based Human Body Tracking and Activity Recognition

Recent technological advances have lead to the development of cameras that measure depth by means of the time-of-flight (ToF) principle [5]. ToF cameras allow capturing an entire scene instantaneously, and thus provide depth images in real-time. Despite the relatively low resolution, this type of data offers a clear advantage over conventional cameras for specific applications, such as human-machine interaction. In this paper, we propose a method that allows simultaneously recognizing the performed activity and tracking the full-body pose of a person observed by a a single ToF camera. Our method removes the need for identifying body parts in sparse and noisy ToF images [4] or for fitting a skeleton using expensive optimisation techniques [1]. The proposed method consists of learning a prior model of human motion and using an efficient, sampling-based inference approach for activity recognition and body tracking (Figure 1). The prior motion model is comprised of a set of low-dimensional manifold embeddings for each activity of interest. We generate the embeddings from full-body pose training data using a manifold learning technique [2]. Each of the embeddings acts as a low-dimensional parametrisation of feasible body poses [3] that we use to constrain the problem of body tracking only from depth cues. In a generative tracking framework, we sample the low-dimensional manifold embedding space by means of a particle filter and thus avoid exhaustively searching the full-body pose space. This way, we are able to track multiple pose hypotheses for different activities and to select one that is most consistent with the observed depth cues. Our depth feature descriptor, intuitively a sparse 3D human silhouette representation, can easily be extracted from ToF images. The overall method combines the distinctiveness of multiple local, activity-specific motion models into a global model capable of recognising and tracking multiple activities from simple observations.

[1]  Reinhard Koch,et al.  MixIn3D: 3D Mixed Reality with ToF-Camera , 2009, Dyn3D.

[2]  Wojciech Matusik,et al.  Practical motion capture in everyday surroundings , 2007, SIGGRAPH 2007.

[3]  Sebastian Thrun,et al.  Real-time identification and localization of body parts from depth images , 2010, 2010 IEEE International Conference on Robotics and Automation.

[4]  Thomas B. Moeslund,et al.  Fusion of range and intensity information for view invariant gesture recognition , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[5]  David J. Fleet,et al.  3D People Tracking with Gaussian Process Dynamical Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Miguel Á. Carreira-Perpiñán,et al.  People Tracking with the Laplacian Eigenmaps Latent Variable Model , 2007, NIPS.

[7]  Andrew E. Johnson,et al.  Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  David J. Fleet,et al.  Priors for people tracking from small training sets , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[9]  Ahmed M. Elgammal,et al.  The Role of Manifold Learning in Human Motion Analysis , 2006, Human Motion.

[10]  Thomas Martinetz,et al.  Scale-invariant range features for time-of-flight camera applications , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[11]  Cristian Sminchisescu,et al.  Spectral Latent Variable Models for Perceptual Inference , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Reinhard Koch,et al.  Time-of-Flight Sensors in Computer Graphics , 2009, Eurographics.

[14]  Rasmus Larsen,et al.  Analyzing Gait Using a Time-of-Flight Camera , 2009, SCIA.

[15]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[16]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[17]  Michael Isard,et al.  A mixed-state condensation tracker with automatic model-switching , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[18]  Sebastian Thrun,et al.  Real time motion capture using a single time-of-flight camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Joachim Hornegger,et al.  Gesture recognition with a Time-Of-Flight camera , 2008, Int. J. Intell. Syst. Technol. Appl..

[20]  Behzad Dariush,et al.  Controlled human pose estimation from depth image streams , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[21]  Joachim Hornegger,et al.  3-D gesture-based scene navigation in medical imaging applications using Time-of-Flight cameras , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[22]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[23]  Luc Van Gool,et al.  Learning Generative Models for Multi-Activity Body Pose Estimation , 2008, International Journal of Computer Vision.

[24]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[25]  Dragomir Anguelov,et al.  Object Pose Detection in Range Scan Data , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[26]  Nassir Navab,et al.  Multiple-Activity Human Body Tracking in Unconstrained Environments , 2010, AMDO.

[27]  Michael J. Black,et al.  Detailed Human Shape and Pose from Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.