Distributed image understanding with semantic dictionary and semantic expansion

Web-scale image understanding is drawing more and more attention from the computer vision and multimedia domain. To solve the key problem of visual polysemia and concept polymorphism in the image understanding, this paper proposes a semantic dictionary to describe the images on the level of semantic. The semantic dictionary characterizes the probability distribution between visual appearances and semantic concepts, and the learning procedure of semantic dictionary is formulated into a minimization optimization problem. Mixed-norm regularization is adopted to solve the above optimization for learning the concept membership distribution of visual appearance. Furthermore, to improve the generalization ability of the semantic description, we propose the semantic expansion technology, where a concept transferring matrix is learnt to quantize the implicit relevancy among the concepts. Finally, the distributed framework on the basis of the semantic dictionary is constructed to speed up the large scale image understanding. The semantic dictionary is validated in the tasks of large scale semantic image search and image annotation.

[1]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Kilian Q. Weinberger,et al.  Fast solvers and efficient implementations for distance metric learning , 2008, ICML '08.

[3]  L. Grippo,et al.  On the convergence of the block nonlinear Gauss-Seidel method under convex constraints , 2000, Oper. Res. Lett..

[4]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[7]  Qingming Huang,et al.  Partial-Duplicate Image Retrieval via Saliency-Guided Visual Matching , 2013, IEEE MultiMedia.

[8]  R. Tyrrell Rockafellar,et al.  Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[9]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Guokun Lai,et al.  Explicit factor models for explainable recommendation based on phrase-level sentiment analysis , 2014, SIGIR.

[11]  Qingming Huang,et al.  Large scale image understanding with non-convex multi-task learning , 2014, The 2014 5th International Conference on Game Theory for Networks.

[12]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[13]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[14]  Charu C. Aggarwal,et al.  Towards cross-category knowledge propagation for learning visual concepts , 2011, CVPR 2011.

[15]  Sutanu Chakraborti,et al.  Topic labeled text classification: a weakly supervised approach , 2014, SIGIR.

[16]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Yongdong Zhang,et al.  A Highly Parallel Framework for HEVC Coding Unit Partitioning Tree Decision on Many-core Processors , 2014, IEEE Signal Processing Letters.

[18]  Yongdong Zhang,et al.  Parallel deblocking filter for HEVC on many-core processor , 2014 .

[19]  Luigi Grippo,et al.  On the convergence of the block nonlinear Gauss-Seidel method under convex constraints , 2000, Oper. Res. Lett..

[20]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[21]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[22]  Shumeet Baluja,et al.  VisualRank: Applying PageRank to Large-Scale Image Search , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Qingming Huang,et al.  Learning Hierarchical Semantic Description Via Mixed-Norm Regularization for Image Understanding , 2012, IEEE Transactions on Multimedia.

[24]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[26]  Rong Jin,et al.  Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition , 2010, NIPS.

[27]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[28]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[29]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[30]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[31]  Ryen W. White Beliefs and biases in web search , 2013, SIGIR.

[32]  Yongdong Zhang,et al.  Efficient Parallel Framework for HEVC Motion Estimation on Many-Core Processors , 2014, IEEE Transactions on Circuits and Systems for Video Technology.