LOCUS: learning object classes with unsupervised segmentation

We address the problem of learning object class models and object segmentations from unannotated images. We introduce LOCUS (learning object classes with unsupervised segmentation) which uses a generative probabilistic model to combine bottom-up cues of color and edge with top-down cues of shape and pose. A key aspect of this model is that the object appearance is allowed to vary from image to image, allowing for significant within-class variation. By iteratively updating the belief in the object's position, size, segmentation and pose, LOCUS avoids making hard decisions about any of these quantities and so allows for each to be refined at any stage. We show that LOCUS successfully learns an object class model from unlabeled images, whilst also giving segmentation accuracies that rival existing supervised methods. Finally, we demonstrate simultaneous recognition and segmentation in novel images using the learned models for a number of object classes, as well as unsupervised object discovery and tracking in video.

[1]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[2]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[3]  Brendan J. Frey,et al.  Learning flexible sprites in video layers , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[4]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Richard S. Zemel,et al.  Multiple Cause Vector Quantization , 2002, NIPS.

[6]  Stella X. Yu,et al.  Object-specific figure-ground segregation , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[7]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[8]  Jianbo Shi,et al.  Object-Specific Figure-Ground Segregation , 2003, CVPR.

[9]  Shimon Ullman,et al.  Combining Top-Down and Bottom-Up Segmentation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[10]  Andrew Zisserman,et al.  Extending Pictorial Structures for Object Recognition , 2004, BMVC.

[11]  Andrew Blake,et al.  Generative Affine Localisation and Tracking , 2004, NIPS.

[12]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[13]  R. Zabih,et al.  What energy functions can be minimized via graph cuts , 2004 .

[14]  Christopher K. I. Williams,et al.  Greedy Learning of Multiple Objects in Images Using Robust Statistics and Factorial Learning , 2004, Neural Computation.

[15]  N. Jojic,et al.  Capturing image structure with probabilistic index maps , 2004, CVPR 2004.

[16]  Lexing Xie,et al.  Slightly Supervised Learning of Part-Based Appearance Models , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[17]  Shimon Ullman,et al.  Learning to Segment , 2004, ECCV.

[18]  Alan L. Yuille,et al.  Feature extraction from faces using deformable templates , 2004, International Journal of Computer Vision.

[19]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[20]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[21]  Brendan J. Frey,et al.  Generative Model for Layers of Appearance and Deformation , 2005, AISTATS.