Enhancing the Quality of Image Tagging Using a Visio-Textual Knowledge Base

Auto-tagging of images is important for image understanding and for tag-based applications viz. image retrieval, visual question-answering, image captioning, etc. Although existing tagging methods incorporate both visual and textual information to assign/refine tags, they lag in tag-image relevance, completeness, and preciseness, thereby resulting in the unsatisfactory performance of tag-based applications. In order to bridge this gap, we propose a novel framework for tag assignment using knowledge embedding (TAKE) from a proposed external knowledge base, considering properties such as Rarity, Newness, Generality, and Naturalness (RNGN properties). These properties help in providing a rich semantic representation to images. Existing knowledge bases provide multiple types of relations extracted through only one modality, either text or visual, which is not effective in image related applications. We construct a simple yet effective Visio-Textual Knowledge Base (VTKB) with only four relations using reliable resources such as Wikipedia, thesauruses, dictionaries, etc. Our large scale experiments demonstrate that the proposed combination of TAKE and VTKB assigns a large number of high quality tags in comparison to the ConceptNet and ImageNet knowledge bases when used in conjunction with TAKE. Also, the effectiveness of knowledge embedding through VTKB is evaluated for image tagging and tag-based image retrieval (TBIR).

[1]  Chaoran Cui,et al.  Social tag relevance learning via ranking-oriented neighbor voting , 2016, Multimedia Tools and Applications.

[2]  Xinlei Chen,et al.  NEIL: Extracting Visual Knowledge from Web Data , 2013, 2013 IEEE International Conference on Computer Vision.

[3]  C.-C. Jay Kuo,et al.  Measuring and Predicting Tag Importance for Image Retrieval , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Xiaoyong Du,et al.  Tag Features for Geo-Aware Image Classification , 2015, IEEE Transactions on Multimedia.

[5]  Jiebo Luo,et al.  Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[7]  Timothy Baldwin,et al.  Relation Guided Bootstrapping of Semantic Lexicons , 2011, ACL.

[8]  Changsheng Xu,et al.  User-Aware Image Tag Refinement via Ternary Semantic Analysis , 2012, IEEE Transactions on Multimedia.

[9]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[10]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[11]  Xuelong Li,et al.  Robust Web Image Annotation via Exploring Multi-Facet and Structural Knowledge , 2017, IEEE Transactions on Image Processing.

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[14]  Geun-Sik Jo,et al.  The wordNet based semantic relationship between tags in folksonomies , 2010, 2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE).

[15]  Marcel Worring,et al.  Learning Social Tag Relevance by Neighbor Voting , 2009, IEEE Transactions on Multimedia.

[16]  Jinhui Tang,et al.  Weakly-Supervised Deep Nonnegative Low-Rank Model for Social Image Tag Refinement and Assignment , 2017, AAAI.

[17]  Jun Sun,et al.  Joint Latent Dirichlet Allocation for Social Tags , 2018, IEEE Transactions on Multimedia.

[18]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[19]  Gerhard Weikum,et al.  Commonsense in Parts: Mining Part-Whole Relations from the Web and Image Tags , 2016, AAAI.

[20]  Shaowei Liu,et al.  General Knowledge Embedded Image Representation Learning , 2018, IEEE Transactions on Multimedia.

[21]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[22]  Kilian Q. Weinberger,et al.  Fast Image Tagging , 2013, ICML.

[23]  Yuan Yan Tang,et al.  Social Image Tagging With Diverse Semantics , 2014, IEEE Transactions on Cybernetics.

[24]  Xirong Li,et al.  Classifying tag relevance with relevant positive and negative examples , 2013, ACM Multimedia.

[25]  Mubarak Shah,et al.  Fast Zero-Shot Image Tagging , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[27]  Pietro Perona,et al.  Visipedia circa 2015 , 2016, Pattern Recognit. Lett..

[28]  Alberto Del Bimbo,et al.  Socializing the Semantic Gap , 2015, ACM Comput. Surv..

[29]  Ramesh C. Jain,et al.  Image annotation by kNN-sparse graph-based label propagation over noisily tagged web images , 2011, TIST.

[30]  Yueting Zhuang,et al.  Tag Clustering and Refinement on Semantic Unity Graph , 2011, 2011 IEEE 11th International Conference on Data Mining.

[31]  Tao Mei,et al.  Image tag refinement by regularized latent Dirichlet allocation , 2013, Comput. Vis. Image Underst..

[32]  Dong Liu,et al.  Image retagging , 2010, ACM Multimedia.

[33]  Alberto Del Bimbo,et al.  Automatic image annotation via label transfer in the semantic space , 2016, Pattern Recognit..

[34]  Gerhard Weikum,et al.  VISIR: Visual and Semantic Image Label Refinement , 2018, WSDM.

[35]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[36]  Wesley De Neve,et al.  Visually weighted neighbor voting for image tag relevance learning , 2014, Multimedia Tools and Applications.

[37]  Xinlei Chen,et al.  Never-Ending Learning , 2012, ECAI.

[38]  Gerhard Weikum,et al.  WebChild: harvesting and organizing commonsense knowledge from the web , 2014, WSDM.

[39]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[40]  Meng Wang,et al.  Image Re-Ranking Based on Topic Diversity , 2017, IEEE Transactions on Image Processing.

[41]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[42]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[43]  Shuicheng Yan,et al.  Image tag refinement towards low-rank, content-tag prior and error sparsity , 2010, ACM Multimedia.

[44]  Yongdong Zhang,et al.  Enhancing Video Event Recognition Using Automatically Constructed Semantic-Visual Knowledge Base , 2015, IEEE Transactions on Multimedia.

[45]  Lei Wu,et al.  Tag Completion for Image Retrieval , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[47]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[48]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[49]  Ngoc Thanh Nguyen,et al.  Semantic similarity measures for enhancing information retrieval in folksonomies , 2013, Expert Syst. Appl..

[50]  Meng Wang,et al.  Learning Visual Semantic Relationships for Efficient Visual Retrieval , 2015, IEEE Transactions on Big Data.

[51]  Vladimir Pavlovic,et al.  Baselines for Image Annotation , 2010, International Journal of Computer Vision.

[52]  Yi-Ping Phoebe Chen,et al.  Linguistic Patterns and Cross Modality-based Image Retrieval for Complex Queries , 2018, ICMR.

[53]  Yi-Ping Phoebe Chen,et al.  Exploiting visual and textual neighborhood information to improve image-tag relevance , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[54]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[55]  Lexing Xie,et al.  Choosing Basic-Level Concept Names Using Visual and Language Context , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[56]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[57]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[58]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[59]  Yi-Ping Phoebe Chen,et al.  A novel multimodal clustering framework for images with diverse associated text , 2018, Multimedia Tools and Applications.

[60]  Wei Liu,et al.  Predicting Entry-Level Categories , 2015, International Journal of Computer Vision.