Unsupervised object class discovery via saliency-guided multiple class learning

Discovering object classes from images in a fully unsupervised way is an intrinsically ambiguous task; saliency detection approaches however ease the burden on unsupervised learning. We develop an algorithm for simultaneously localizing objects and discovering object classes via bottom-up (saliency-guided) multiple class learning (bMCL), and make the following contributions: (1) saliency detection is adopted to convert unsupervised learning into multiple instance learning, formulated as bottom-up multiple class learning (bMCL); (2) we utilize the Discriminative EM (DiscEM) to solve our bMCL problem and show DiscEM's connection to the MIL-Boost method[34]; (3) localizing objects, discovering object classes, and training object detectors are performed simultaneously in an integrated framework; (4) significant improvements over the existing methods for multi-class object discovery are observed. In addition, we show single class localization as a special case in our bMCL framework and we also demonstrate the advantage of bMCL over purely data-driven saliency methods.

[1]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[2]  Pietro Perona,et al.  Is bottom-up attention useful for object recognition? , 2004, CVPR 2004.

[3]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Shang-Hong Lai,et al.  From co-saliency to co-segmentation: An efficient and fully unsupervised energy minimization model , 2011, CVPR 2011.

[5]  Ali Farhadi,et al.  Scene Discovery by Matrix Factorization , 2008, ECCV.

[6]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[7]  Zhi-Hua Zhou,et al.  Multi-instance clustering with applications to multi-instance prediction , 2009, Applied Intelligence.

[8]  Pietro Perona,et al.  Multiple Component Learning for Object Detection , 2008, ECCV.

[9]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[10]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[11]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[12]  Tsuhan Chen,et al.  Unsupervised Image Categorization and Object Localization using Topic Models and Correspondences between Images , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[13]  Trevor Darrell,et al.  Unsupervised Learning of Categories from Sets of Partially Matching Image Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Thomas Deselaers,et al.  Localizing Objects While Learning Their Appearance , 2010, ECCV.

[15]  Yong Jae Lee,et al.  Learning the easy things first: Self-paced visual category discovery , 2011, CVPR 2011.

[16]  Jiebo Luo,et al.  iCoseg: Interactive co-segmentation with intelligent scribble guidance , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Yong Jae Lee,et al.  Foreground Focus: Unsupervised Learning from Partially Matching Images , 2009, International Journal of Computer Vision.

[18]  Samuel Kaski,et al.  Expectation maximization algorithms for conditional likelihoods , 2005, ICML '05.

[19]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Andrew Zisserman,et al.  An Exemplar Model for Learning Object Classes , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[23]  Luo Si,et al.  M3IC: Maximum Margin Multiple Instance Clustering , 2009, IJCAI.

[24]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[25]  Paul A. Viola,et al.  Multiple Instance Boosting for Object Detection , 2005, NIPS.

[26]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[27]  Alex Pentland,et al.  Maximum Conditional Likelihood via Bound Maximization and the CEM Algorithm , 1998, NIPS.

[28]  Serge J. Belongie,et al.  Simultaneous Learning and Alignment: Multi-Instance and Multi-Pose Learning ? , 2008 .

[29]  Luc Van Gool,et al.  Object Detection by Contour Segment Networks , 2006, ECCV.

[30]  Silvio Savarese,et al.  3D generic object categorization, localization and pose estimation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[31]  Yong Jae Lee,et al.  Shape discovery from unlabeled image collections , 2009, CVPR.

[32]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[33]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[34]  Christoph H. Lampert,et al.  Unsupervised Object Discovery: A Comparison , 2010, International Journal of Computer Vision.

[35]  Dale Schuurmans,et al.  Maximum Margin Clustering , 2004, NIPS.

[36]  Fei Wang,et al.  Maximum Margin Multiple Instance Clustering With Applications to Image and Text Clustering , 2011, IEEE Transactions on Neural Networks.

[37]  Hui Zhang,et al.  Localized Content-Based Image Retrieval , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Fei Wang,et al.  Efficient multiclass maximum margin clustering , 2008, ICML '08.

[39]  Christos Faloutsos,et al.  Unsupervised modeling of object categories using link analysis techniques , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Long Zhu,et al.  Unsupervised Learning of Probabilistic Grammar-Markov Models for Object Categories , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Jian Sun,et al.  Salient object detection by composition , 2011, 2011 International Conference on Computer Vision.

[42]  Liqing Zhang,et al.  Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.