Low-cost scene modeling using a density function improves segmentation performance

We propose a low cost and effective way to combine a free simulation software and free CAD models for modeling human-object interaction in order to improve human & object segmentation. It is intended for research scenarios related to safe human-robot collaboration (SHRC) and interaction (SHRI) in the industrial domain. The task of human and object modeling has been used for detecting activity, and for inferring and predicting actions, different from those works, we do human and object modeling in order to learn interactions in RGB-D data for improving segmentation. For this purpose, we define a novel density function to model a three dimensional (3D) scene in a virtual environment (VREP). This density function takes into account various possible configurations of human-object and object-object relationships and interactions governed by their affordances. Using this function, we synthesize a large, realistic and highly varied synthetic RGB-D dataset that we use for training. We train a random forest classifier, and the pixelwise predictions obtained is integrated as a unary term in a pairwise conditional random fields (CRF). Our evaluation shows that modeling these interactions improves segmentation performance by ~7% in mean average precision and recall over state-of-the-art methods that ignore these interactions in real-world data. Our approach is computationally efficient, robust and can run real-time on consumer hardware.

[1]  Ali Farhadi,et al.  Recognition using visual phrases , 2011, CVPR 2011.

[2]  Nobuto Matsuhira,et al.  Virtual Robot Experimentation Platform V-REP: A Versatile 3D Robot Simulator , 2010, SIMPAR.

[3]  Bart Selman,et al.  Human Activity Detection from RGBD Images , 2011, Plan, Activity, and Intent Recognition.

[4]  Yun Jiang,et al.  Learning Object Arrangements in 3D Scenes using Human Context , 2012, ICML.

[5]  Zhuowen Tu,et al.  Auto-context and its application to high-level vision tasks , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Andrew Blake,et al.  Efficient Human Pose Estimation from Single Depth Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Peter Kontschieder,et al.  Structured class-labels in random forests for semantic image labelling , 2011, 2011 International Conference on Computer Vision.

[8]  Luc Van Gool,et al.  Efficient Real-Time Pixelwise Object Class Labeling for Safe Human-Robot Collaboration in Industrial Domain , 2015, MLIS@ICML.

[9]  Alois Knoll,et al.  Testing of Image Processing Algorithms on Synthetic Data , 2009, 2009 Fourth International Conference on Software Engineering Advances.

[10]  Slobodan Ilic,et al.  Framework for Generation of Synthetic Ground Truth Data for Driver Assistance Applications , 2013, GCPR.

[11]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[12]  Frank Dittrich,et al.  Pixelwise object class segmentation based on synthetic data using an optimized training strategy , 2014, 2014 First International Conference on Networks & Soft Computing (ICNSC2014).

[13]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Sebastian Thrun,et al.  Real time motion capture using a single time-of-flight camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Larry S. Davis,et al.  Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  R. B. Potts Some generalized order-disorder transformations , 1952, Mathematical Proceedings of the Cambridge Philosophical Society.

[18]  Luc Van Gool,et al.  Improving Human Pose Recognition Accuracy using CRF Modeling , 2015 .

[19]  Hedvig Kjellström,et al.  Functional object descriptors for human activity modeling , 2013, 2013 IEEE International Conference on Robotics and Automation.

[20]  Sebastian Nowozin,et al.  Decision tree fields , 2011, 2011 International Conference on Computer Vision.

[21]  Heinz Wörn,et al.  Probabilistic Modeling of Real-world Scenes in a Virtual Environment , 2015, GRAPP.

[22]  Eren Erdal Aksoy,et al.  Categorizing object-action relations from semantic scene graphs , 2010, 2010 IEEE International Conference on Robotics and Automation.