Multimodal Mixed Conditional Random Field Model for Category-Independent Object Detection

Category-independent object detection is extremely useful for many robot vision tasks. Most existing methods rank a lot of regions by measuring their object-likeness. However, to obtain a sufficient object covering rate too many regions need to be sampled. In this paper, we present a novel method that directly detects and localizes category-independent objects. We develop a novel model which is named as “mixed robust higher-order conditional random field” model which combines 2D and 3D data into a uniform framework. A set of novel features is developed based on 2D and 3D saliency and oversegments. The potentials used in this model are computed from these features. Extensive experiments are carried out on a public RGB-D dataset. By comparison with state-ofthe- art ranking methods, the experimental results show the comparable performance of category-independent object detection without sampling a large number of extra regions.

[1]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Paria Mehrani,et al.  Superpixels and Supervoxels in an Energy Optimization Framework , 2010, ECCV.

[3]  Jitendra Malik,et al.  Figure/Ground Assignment in Natural Images , 2006, ECCV.

[4]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[5]  Junhao Xiao,et al.  Integrate multi-modal cues for category-independent object detection and localization , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[7]  Sven J. Dickinson,et al.  Optimal Contour Closure by Superpixel Grouping , 2010, ECCV.

[8]  Andrew McCallum,et al.  Piecewise Training for Undirected Models , 2005, UAI.

[9]  Matthew B. Blaschko,et al.  Learning a category independent object detection cascade , 2011, 2011 International Conference on Computer Vision.

[10]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[12]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Lihi Zelnik-Manor,et al.  Context-Aware Saliency Detection , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Trevor Darrell,et al.  Practical 3-D Object detection using category and instance-level appearance models , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[17]  Yin Li,et al.  Visual Saliency Based on Conditional Entropy , 2009, ACCV.

[18]  Antonio Torralba,et al.  A Tree-Based Context Model for Object Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Jitendra Malik,et al.  Using contours to detect and localize junctions in natural images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Cristian Sminchisescu,et al.  Constrained parametric min-cuts for automatic object segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Cristian Sminchisescu,et al.  CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Thomas Deselaers,et al.  What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[24]  Jian Sun,et al.  Salient object detection by composition , 2011, 2011 International Conference on Computer Vision.

[25]  Derek Hoiem,et al.  Category Independent Object Proposals , 2010, ECCV.

[26]  Cristian Sminchisescu,et al.  Image segmentation by figure-ground composition into maximal cliques , 2011, 2011 International Conference on Computer Vision.

[27]  Siddhartha S. Srinivasa,et al.  Structure discovery in multi-modal data: A region-based approach , 2011, 2011 IEEE International Conference on Robotics and Automation.

[28]  Ming-Hsuan Yang,et al.  Robust Object Tracking with Online Multiple Instance Learning , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.