Spatial Latent Dirichlet Allocation

In recent years, the language model Latent Dirichlet Allocation (LDA), which clusters co-occurring words into topics, has been widely applied in the computer vision field. However, many of these applications have difficulty with modeling the spatial and temporal structure among visual words, since LDA assumes that a document is a "bag-of-words". It is also critical to properly design "words" and "documents" when using a language model to solve vision problems. In this paper, we propose a topic model Spatial Latent Dirichlet Allocation (SLDA), which better encodes spatial structures among visual words that are essential for solving many vision problems. The spatial information is not encoded in the values of visual words but in the design of documents. Instead of knowing the partition of words into documents a priori, the word-document assignment becomes a random hidden variable in SLDA. There is a generative procedure, where knowledge of spatial structure can be flexibly added as a prior, grouping visual words which are close in space into the same document. We use SLDA to discover objects from a collection of images, and show it achieves better performance than LDA.

[1]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[2]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[3]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[5]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  Alexei A. Efros,et al.  Discovering object categories in image collections , 2005 .

[7]  Antonio Torralba,et al.  Learning hierarchical models of scenes, objects, and parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[9]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories , 2006 .

[10]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[12]  W. Eric L. Grimson,et al.  Unsupervised Activity Perception by Hierarchical Bayesian Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Bill Triggs,et al.  Region Classification with Markov Field Aspect Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Li Fei-Fei,et al.  Spatially coherent latent topic model for concurrent object segmentation and classification , 2007 .

[15]  Fei-Fei Li,et al.  Spatially Coherent Latent Topic Model for Concurrent Segmentation and Classification of Objects and Scenes , 2007, 2007 IEEE 11th International Conference on Computer Vision.