A More Effective Method for Image Representation: Topic Model Based on Latent Dirichlet Allocation

Nowadays, the Bag-of-words (BoW) representation is well applied to recent state-of-the-art image retrieval works. However, with the rapid growth in the number of images, the dimension of the dictionary increases substantially which leads to great storage and CPU cost. Besides, the local features do not convey any semantic information which is very important in image retrieval. In this paper, we propose to use "topics" instead of "visual words" as the image representation by topic model to reduce the feature dimension and mine more high level semantic information. We call this as Bag-of-Topics (BoT) which is a type of statistical model for discovering the abstract "topics" from the words. We extract the topics by Latent Dirichlet Allocation (LDA) and calculate the similarity between the images using BoT model instead of BoW directly. The results show that the dimension of the image representation has been reduced significantly, while the retrieval performance is improved.

[1]  I KingWillford,et al.  The Annals Of Mathematical Statistics Vol-i , 2022 .

[2]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[3]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[4]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[5]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Jianfei Cai,et al.  Compact Representation for Image Classification: To Choose or to Compress? , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[8]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Thomas Hofmann,et al.  Learning the Similarity of Documents: An Information-Geometric Approach to Document Retrieval and Categorization , 1999, NIPS.

[10]  Qi Tian,et al.  Packing and Padding: Coupled Multi-index for Accurate Image Retrieval , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[13]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[15]  Jean-Daniel Boissonnat,et al.  Proceedings of the twentieth annual symposium on Computational geometry , 2004, SoCG 2004.

[16]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[18]  Kristen Grauman,et al.  Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[20]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).