Training-free indexing refinement for visual media via multi-semantics

Indexing of visual media based on content analysis has now moved beyond using individual concept detectors and there is now a focus on combining concepts by post-processing the outputs of individual concept detection. Due to the limitations and availability of training corpora which are usually sparsely and imprecisely labeled with concept groundtruth, training-based refinement methods for semantic indexing of visual media suffer in correctly capturing relationships between concepts, including co-occurrence and ontological relationships. In contrast to training-dependent methods which dominate this field, this paper presents a training-free refinement (TFR) algorithm for enhancing semantic indexing of visual media based purely on concept detection results, making the refinement of initial concept detections based on semantic enhancement, practical and flexible. This is achieved using what can be called multi-semantics, factoring in semantics from multiple sources. In the case of this paper, global and temporal neighbourhood information inferred from the original concept detections in terms of weighted non-negative matrix factorization and neighbourhood-based graph propagation are both used in the refinement of semantics. Furthermore, any available ontological concept relationships among concepts can also be integrated into this model as an additional source of external a priori knowledge. Extended experiments on two heterogeneous datasets, images from wearable cameras and videos from TRECVid, demonstrate the efficacy of the proposed TFR solution.

[1]  Shih-Fu Chang,et al.  A reranking approach for context-based concept fusion in video indexing and retrieval , 2007, CIVR '07.

[2]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[3]  Lifeng Sun,et al.  Characterizing everyday activities from visual lifelogs based on enhancing concept representation , 2016, Comput. Vis. Image Underst..

[4]  Alan F. Smeaton,et al.  Factorizing Time-Aware Multi-way Tensors for Enhancing Semantic Wearable Sensing , 2015, MMM.

[5]  Dan Xu,et al.  Find you from your friends: Graph-based residence location prediction for users in social media , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[6]  Chong-Wah Ngo,et al.  Fast Semantic Diffusion for Large-Scale Context-Based Image and Video Annotation , 2012, IEEE Transactions on Image Processing.

[7]  Latifur Khan,et al.  Image annotations by combining multiple evidence & wordNet , 2005, ACM Multimedia.

[8]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[9]  Zhiwu Lu,et al.  Image classification by visual bag-of-words refinement and reduction , 2015, Neurocomputing.

[10]  Stéphane Ayache,et al.  Active Cleaning for Video Corpus Annotation , 2012, MMM.

[11]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[12]  Edward Y. Chang,et al.  Confidence-based dynamic ensemble for image annotation and semantics discovery , 2003, MULTIMEDIA '03.

[13]  Eric Bruno,et al.  TagCaptcha: annotating images with CAPTCHAs , 2009, HCOMP '09.

[14]  Chong-Wah Ngo,et al.  Domain adaptive semantic diffusion for large scale context-based video annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15]  Paul Over,et al.  High-level feature detection from video in TRECVid: a 5-year retrospective of achievements , 2009 .

[16]  Georges Quénot,et al.  TRECVid Semantic Indexing of Video: A 6-year Retrospective , 2016 .

[17]  L. Stein,et al.  OWL Web Ontology Language - Reference , 2004 .

[18]  Djoerd Hiemstra,et al.  Simulating the future of concept-based video retrieval under improved detector performance , 2011, Multimedia Tools and Applications.

[19]  Yifan Zhang,et al.  Correlation consistency constrained probabilistic matrix factorization for social tag refinement , 2013, Neurocomputing.

[20]  Chong-Wah Ngo,et al.  Towards optimal bag-of-features for object categorization and semantic video retrieval , 2007, CIVR '07.

[21]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[22]  Stéphane Ayache,et al.  Video Corpus Annotation Using Active Learning , 2008, ECIR.

[23]  Yi Wu,et al.  Ontology-based multi-classification learning for video concept detection , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[24]  Changhu Wang,et al.  Content-Based Image Annotation Refinement , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Marcel Worring,et al.  Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[26]  Shih-Fu Chang,et al.  Context-Based Concept Fusion with Boosted Conditional Random Fields , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[27]  Tao Mei,et al.  A comprehensive representation scheme for video semantic ontology and its applications in semantic concept detection , 2012, Neurocomputing.

[28]  Dan Brickley,et al.  Rdf vocabulary description language 1.0 : Rdf schema , 2004 .

[29]  Jianping Fan,et al.  Correlative multi-label multi-instance image annotation , 2011, 2011 International Conference on Computer Vision.

[30]  Ian Horrocks,et al.  OWL Web Ontology Language Reference-W3C Recommen-dation , 2004 .

[31]  Changhu Wang,et al.  Image annotation refinement using random walk with restarts , 2006, MM '06.