Human Centered Scene Understanding Based on 3D Long-Term Tracking Data

Scene understanding approaches are mainly based on geometric information, not considering the behavior of humans. The proposed approach introduces a novel human-centric scene understanding approach, based on long-term tracking information. Long-term tracking information is filtered, clustered and areas offering meaningful functionalities for humans are modeled using a kernel density estimation. This approach allows to model walking and sitting areas within an indoor scene without considering any geometric information. Thus, it solely uses continuous and noisy tracking data, acquired from a 3D sensor, monitoring the scene from a bird’s eye view. The proposed approach is evaluated on three different datasets from two application domains (home and office environment), containing more than 180 days of tracking data.

[1]  Bart Selman,et al.  Human Activity Detection from RGBD Images , 2011, Plan, Activity, and Intent Recognition.

[2]  Jitendra Malik,et al.  Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Bart Selman,et al.  Unstructured human activity detection from RGBD images , 2011, 2012 IEEE International Conference on Robotics and Automation.

[4]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Alexei A. Efros,et al.  Scene Semantics from Long-Term Observation of People , 2012, ECCV.

[7]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[8]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[9]  Gang Wang,et al.  Human-Centric Indoor Environment Modeling from Depth Videos , 2012, ECCV Workshops.

[10]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Alexei A. Efros,et al.  People Watching: Human Actions as a Cue for Single View Geometry , 2012, International Journal of Computer Vision.

[12]  Martin Kampel,et al.  Robust Fall Detection by Combining 3D Data and Fuzzy Logic , 2012, ACCV Workshops.

[13]  Alexei A. Efros,et al.  From 3D scene geometry to human workspace , 2011, CVPR 2011.

[14]  Jonathan T. Barron,et al.  A category-level 3-D object dataset: Putting the Kinect to work , 2011, ICCV Workshops.