From mid-level to high-level: Semantic inference for multimedia retrieval

The problem of bridging the semantic gap can be approached by dividing all types of metadata extracted from multimedia content into three levels — low, mid and high — according to their levels of semantic abstraction and try to define the mapping between them. This paper proposes a scheme for extracting high-level semantic information out of mid-level features, which can be applied in dealing with highly semantic queries in image retrieval. Mid-level features used in this research contain some level of semantic meaning but are not directly useful in real retrieval scenarios. However, they usually have strong relationships to high-level queries but these relationships are often ignored due to their implicitness. The aim of the proposed approach is to explore hidden interrelationships between mid-level features and the high-level query terms, by learning a Bayesian network model from a small amount of training data. Semantic inference and reasoning is then carried out based on the learned Bayesian network model, in order to decide whether a video is relevant to a high-level query. The extracted high-level semantic terms can be annotated on the video content for future retrieval. Two experimental scenarios were considered in this paper and the experiments on RUSHES videos have produced satisfactory results.

[1]  M. Ibrahim Sezan,et al.  A computational approach to semantic event detection , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[2]  Ben Bradshaw,et al.  Semantic based image retrieval: a probabilistic approach , 2000, ACM Multimedia.

[3]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[5]  Jiebo Luo,et al.  Beyond pixels: Exploiting camera metadata for photo classification , 2005, Pattern Recognit..

[6]  Anil K. Jain,et al.  Image classification for content-based indexing , 2001, IEEE Trans. Image Process..

[7]  Ingemar J. Cox,et al.  The Bayesian image retrieval system, PicHunter: theory, implementation, and psychophysical experiments , 2000, IEEE Trans. Image Process..

[8]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Shih-Fu Chang,et al.  Generating semantic visual templates for video databases , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[10]  Derek Hoiem,et al.  Object-based image retrieval using the statistical structure of images , 2004, CVPR 2004.

[11]  Edward Y. Chang,et al.  CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines , 2003, IEEE Trans. Circuits Syst. Video Technol..

[12]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[13]  Milind R. Naphade,et al.  Extracting semantics from audio-visual content: the final frontier in multimedia retrieval , 2002, IEEE Trans. Neural Networks.

[14]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[15]  Milind R. Naphade,et al.  A probabilistic framework for semantic video indexing, filtering, and retrieval , 2001, IEEE Trans. Multim..

[16]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  Qiang Yang,et al.  A unified framework for semantics and feature based relevance feedback in image retrieval systems , 2000, ACM Multimedia.

[18]  William I. Grosky,et al.  Narrowing the semantic gap - improved text-based web document retrieval using visual features , 2002, IEEE Trans. Multim..