Spatial context for visual vocabulary construction

The bag-of-visual-words model has been widely used in many applications, such as object recognition, image categorization, and visual information retrieval. However, most existing approaches construct a visual vocabulary by simply clustering image regions represented with low-level visual features, where spatial context of image regions has not been well utilized. In this paper, we present two techniques to take such a context into account. One is based on the Self-Organizing Map for Adaptive Processing of Structured Data (SOM-SD), and the other is based on our proposed Hierarchical Fuzzy C-means with Spatial Constraints (FCM-HS). We have employed these two methods together with language modeling for image categorization. Experimental results obtained on Caltech dataset have demonstrated that these two methods can achieve better classification performance than those without considering spatial context. The comparison of these two methods is also discussed in this paper.

[1]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[2]  Aly A. Farag,et al.  A modified fuzzy c-means algorithm for bias field estimation and segmentation of MRI data , 2002, IEEE Transactions on Medical Imaging.

[3]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Pierre Tirilly,et al.  Language modeling for bag-of-visual words image categorization , 2008, CIVR '08.

[5]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  Wan-Chi Siu,et al.  Efficient Learning in Adaptive Processing of Data Structures , 2004, Neural Processing Letters.

[7]  Rong Yan,et al.  Image Classification Using a Bigram Model , 2003 .

[8]  Nenghai Yu,et al.  Visual language modeling for image classification , 2007, MIR '07.

[9]  Daoqiang Zhang,et al.  Robust image segmentation using FCM with spatial constraints based on new kernel-induced distance measure , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[10]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Markus Hagenbuchner,et al.  Image classification with structured self-organization map , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[12]  Ah Chung Tsoi,et al.  A self-organizing map for adaptive processing of structured data , 2003, IEEE Trans. Neural Networks.

[13]  Ah Chung Tsoi,et al.  An improved algorithm for learning long-term dependency problems in adaptive processing of data structures , 2003, IEEE Trans. Neural Networks.

[14]  Hanchuan Peng,et al.  Document Image Recognition Based on Template Matching of Component Block Projections , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Gisbert Schneider,et al.  A Hierarchical Clustering Approach for Large Compound Libraries. , 2005 .

[16]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .