Hybrid learning of large jigsaws

A jigsaw is a recently proposed generative model that describes an image as a composition of non-overlapping patches of varying shape, extracted from a latent image. By learning the latent jigsaw image which best explains a set of images, it is possible to discover the shape, size and appearance of repeated structures in the images. A challenge when learning this model is the very large space of possible jigsaw pixels which can potentially be used to explain each image pixel. The previous method of inference for this model scales linearly with the number of jigsaw pixels, making it unusable for learning the large jigsaws needed for many practical applications. In this paper, we make three contributions that enable the learning of large jigsaws -a novel sparse belief propagation algorithm, a hybrid method which significantly improves the sparseness of this algorithm, and a method that uses these techniques to make learning of large jigsaws feasible. We provide detailed analysis of how our hybrid inference method leads to significant savings in memory and computation time. To demonstrate the success of our method, we present experimental results applying large jigsaws to an object recognition task.

[1]  Shimon Ullman,et al.  Combining Top-Down and Bottom-Up Segmentation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[2]  Vincent Lepetit,et al.  Randomized trees for real-time keypoint recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  Thomas Hofmann,et al.  Clustering appearance and shape by learning jigsaws , 2007 .

[4]  Daniel P. Huttenlocher,et al.  Spatial priors for part-based recognition using statistical models , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Guillaume Bouchard,et al.  The Tradeoff Between Generative and Discriminative Classifiers , 2004 .

[6]  Tom Minka,et al.  Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[8]  Nikos Komodakis,et al.  Image Completion Using Global Optimization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  James M. Coughlan,et al.  Finding Deformable Shapes Using Loopy Belief Propagation , 2002, ECCV.

[10]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[11]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[12]  Geoffrey E. Hinton,et al.  Inferring Motor Programs from Images of Handwritten Digits , 2005, NIPS.

[13]  Brendan J. Frey,et al.  Epitomic analysis of appearance and shape , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Christopher Joseph Pal,et al.  Sparse Forward-Backward Using Minimum Divergence Beams for Fast Training Of Conditional Random Fields , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.