Understanding the Gist of Images - Ranking of Concepts for Multimedia Indexing

Nowadays, where multimedia data is continuously generated, stored, and distributed, multimedia indexing, with its purpose of group- ing similar data, becomes more important than ever. Understanding the gist (=message) of multimedia instances is framed in related work as a ranking of concepts from a knowledge base, i.e., Wikipedia. We cast the task of multimedia indexing as a gist understanding problem. Our pipeline benefits from external knowledge and two subsequent learning- to-rank (l2r) settings. The first l2r produces a ranking of concepts rep- resenting the respective multimedia instance. The second l2r produces a mapping between the concept representation of an instance and the targeted class topic(s) for the multimedia indexing task. The evaluation on an established big size corpus (MIRFlickr25k, with 25,000 images), shows that multimedia indexing benefits from understanding the gist. Finally, with a MAP of 61.42, it can be shown that the multimedia in- dexing task benefits from understanding the gist. Thus, the presented end-to-end setting outperforms DBM and competes with Hashing-based methods.

[1]  Simone Paolo Ponzetto,et al.  Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems , 2010, ACL.

[2]  Hang Li,et al.  A Short Introduction to Learning to Rank , 2011, IEICE Trans. Inf. Syst..

[3]  Zheng Lin,et al.  Deep Supervised Hashing for Multi-Label and Large-Scale Image Retrieval , 2017, ICMR.

[4]  Christoph Meinel,et al.  A deep semantic framework for multimodal representation learning , 2016, Multimedia Tools and Applications.

[5]  Bart Thomee,et al.  New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative , 2010, MIR '10.

[6]  Simone Paolo Ponzetto,et al.  Understanding the Message of Images with Knowledge Base Traversals , 2016, ICTIR.

[7]  Guilin Chen,et al.  Learning Robust Multi-Label Hashing for Efficient Image Retrieval , 2016, PCM.

[8]  Mirella Lapata,et al.  An Experimental Study of Graph Connectivity for Unsupervised Word Sense Disambiguation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[10]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[11]  Jonathon S. Hare,et al.  Automatically annotating the MIR Flickr dataset: experimental protocols, openly available data and semantic spaces , 2010, MIR '10.

[12]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[13]  W. Bruce Croft,et al.  Linear feature-based models for information retrieval , 2007, Information Retrieval.

[14]  Ioana Hulpus,et al.  Path-Based Semantic Relatedness on Linked Data and Its Use to Word and Entity Disambiguation , 2015, International Semantic Web Conference.