Human Centred Object Co-Segmentation

Co-segmentation is the automatic extraction of the common semantic regions given a set of images. Different from previous approaches mainly based on object visuals, in this paper, we propose a human centred object co-segmentation approach, which uses the human as another strong evidence. In order to discover the rich internal structure of the objects reflecting their human-object interactions and visual similarities, we propose an unsupervised fully connected CRF auto-encoder incorporating the rich object features and a novel human-object interaction representation. We propose an efficient learning and inference algorithm to allow the full connectivity of the CRF with the auto-encoder, that establishes pairwise relations on all pairs of the object proposals in the dataset. Moreover, the auto-encoder learns the parameters from the data itself rather than supervised learning or manually assigned parameters in the conventional CRF. In the extensive experiments on four datasets, we show that our approach is able to extract the common objects more accurately than the state-of-the-art co-segmentation algorithms.

[1]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[2]  Fei-Fei Li,et al.  Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Pascal Fua,et al.  Learning Structured Models for Segmentation of 2-D and 3-D Imagery , 2015, IEEE Transactions on Medical Imaging.

[4]  Jiebo Luo,et al.  Interactively Co-segmentating Topically Related Images with Intelligent Scribble Guidance , 2011, International Journal of Computer Vision.

[5]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Takeo Kanade,et al.  Distributed cosegmentation via submodular optimization on anisotropic diffusion , 2011, 2011 International Conference on Computer Vision.

[7]  King Ngi Ngan,et al.  Object Co-Segmentation Based on Shortest Path Algorithm and Saliency Model , 2012, IEEE Transactions on Multimedia.

[8]  Noah A. Smith,et al.  Conditional Random Field Autoencoders for Unsupervised Structured Prediction , 2014, NIPS.

[9]  Ivan Laptev,et al.  Learning person-object interactions for action recognition in still images , 2011, NIPS.

[10]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[11]  Yun Jiang,et al.  Infinite Latent Conditional Random Fields for Modeling Environments through Humans , 2013, Robotics: Science and Systems.

[12]  Roberto Cipolla,et al.  Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[13]  Stephen Lin,et al.  Object-based RGBD image co-segmentation with mutex constraint , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[15]  Jia Xu,et al.  Analyzing the Subspace Structure of Related Images: Concurrent Segmentation of Image Sets , 2012, ECCV.

[16]  Jean Ponce,et al.  Discriminative clustering for image co-segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Rita Cucchiara,et al.  Learning superpixel relations for supervised image segmentation , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[18]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[19]  Nanning Zheng,et al.  Modeling 4D Human-Object Interactions for Event and Object Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Dieter Fox,et al.  RGB-(D) scene labeling: Features and algorithms , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Dieter Fox,et al.  Kernel Descriptors for Visual Recognition , 2010, NIPS.

[22]  Rita Cucchiara,et al.  Scene segmentation using temporal clustering for accessing and re-using broadcast video , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).

[23]  Yun Jiang,et al.  Hallucinated Humans as the Hidden Context for Labeling 3D Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Mario Fritz,et al.  Multi-class Video Co-segmentation with a Generative Multi-video Model , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Jiebo Luo,et al.  iCoseg: Interactive co-segmentation with intelligent scribble guidance , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Stephen Lin,et al.  Object-Based Multiple Foreground Video Co-segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Andrew Blake,et al.  Cosegmentation of Image Pairs by Histogram Matching - Incorporating a Global Constraint into MRFs , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28]  Vladimir Kolmogorov,et al.  Object cosegmentation , 2011, CVPR 2011.

[29]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[30]  Charless C. Fowlkes,et al.  Discriminative models for static human-object interactions , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[31]  Jean Ponce,et al.  Multi-class cosegmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Ashutosh Saxena,et al.  Hierarchical Semantic Labeling for Task-Relevant RGB-D Perception , 2014, Robotics: Science and Systems.

[33]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[34]  Silvio Savarese,et al.  Watch-n-patch: Unsupervised understanding of actions and relations , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Juan Carlos Niebles,et al.  Spatio-temporal Human-Object Interactions for Action Recognition in Videos , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[36]  Vladimir Kolmogorov,et al.  Cosegmentation Revisited: Models and Optimization , 2010, ECCV.

[37]  Vikas Singh,et al.  An efficient algorithm for Co-segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[38]  Hema Swetha Koppula,et al.  Learning human activities and object affordances from RGB-D videos , 2012, Int. J. Robotics Res..