论文信息 - Spring Lattice Counting Grids: Scene Recognition Using Deformable Positional Constraints

Spring Lattice Counting Grids: Scene Recognition Using Deformable Positional Constraints

Adopting the Counting Grid (CG) representation [1], the Spring Lattice Counting Grid (SLCG) model uses a grid of feature counts to capture the spatial layout that a variety of images tend to follow. The images are mapped to the counting grid with their features rearranged so as to strike a balance between the mapping quality and the extent of the necessary rearrangement. In particular, the feature sets originating from different image sectors are mapped to different sub-windows in the counting grid in a configuration that is close, but not exactly the same as the configuration of the source sectors. The distribution over deformations of the sector configuration is learnable using a new spring lattice model, while the rearrangement of features within a sector is unconstrained. As a result, the CG model gains a more appropriate level of invariance to realistic image transformations like view point changes, rotations or scales. We tested SLCG on standard scene recognition datasets and on a dataset collected with a wearable camera which recorded the wearer's visual input over three weeks. Our algorithm is capable of correctly classifying the visited locations more than 80% of the time, outperforming previous approaches to visual location recognition. At this level of performance, a variety of real-world applications of wearable cameras become feasible.

Nebojsa Jojic | Alessandro Perina | N. Jojic | A. Perina

[1] Nebojsa Jojic,et al. Structural epitome: a way to summarize one's visual experience , 2010, NIPS.

[2] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[3] Nebojsa Jojic,et al. Free energy score space , 2009, NIPS.

[4] Michael Isard,et al. Nonparametric belief propagation , 2010, Commun. ACM.

[5] Michael I. Mandel,et al. Visual Hand Tracking Using Nonparametric Belief Propagation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[6] Antonio Torralba,et al. Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7] Nebojsa Jojic,et al. Multidimensional counting grids: Inferring word order from disordered bags of words , 2011, UAI.

[8] Vladlen Koltun,et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[9] Svetlana Lazebnik,et al. Scene recognition and weakly supervised object localization with deformable part-based models , 2011, 2011 International Conference on Computer Vision.

[10] Antti Oulasvirta,et al. Computer Vision – ECCV 2006 , 2006, Lecture Notes in Computer Science.

[11] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12] David G. Lowe,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[13] Antonio Torralba,et al. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[14] Pietro Perona,et al. Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[15] Pedro F. Felzenszwalb,et al. Reconfigurable models for scene recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Pietro Perona,et al. A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17] Fei-Fei Li,et al. Large Margin Learning of Upstream Scene Understanding Models , 2010, NIPS.

[18] Andrew Zisserman,et al. Scene Classification Via pLSA , 2006, ECCV.

[19] Antonio Torralba,et al. Recognizing indoor scenes , 2009, CVPR.

[20] Nebojsa Jojic,et al. Image analysis by counting on a grid , 2011, CVPR 2011.

[21] Brendan J. Frey,et al. Epitomic analysis of appearance and shape , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[22] Michael Isard,et al. PAMPAS: real-valued graphical models for computer vision , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[23] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..