Visual Information Analysis for Interactive TV Applications

Since its introduction, television has been providing to millions of users a non-interactive experience, in which viewers can only participate as passive consumers of audiovisual content. Recently, the extreme proliferation and success of the Internet and the widespread appraisal of the interaction possibilities that it offers gave rise to the idea of the Interactive TV: a television broadcast in which users do not only passively consume the content, but similarly to the Web, they can navigate across multiple pieces of content, following links that are similar in nature to the hypertext links between textual documents. In this article, we will discuss the visual information analysis technologies and tools that are necessary for supporting the interlinking of visual content in a fashion that allows users to navigate between fragments of the content. We will cover analysis technologies that range from video content fragmentation into temporal units (shots, scenes), to the labeling of visual content via concept-based annotation and re-detection of objects of interest in the video. Such technologies are necessary for empowering video hyperlinking, so that e.g. an object of interest in one video segment can be linked to other relevant segments of the same video, or also to entirely different videos that relate to it. For these key-enabling analysis technologies, we will review the state-of-the-art and we will further elaborate on and provide indicative results for specific techniques that are particularly relevant to TV content. BACKGROUND

[1]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[3]  Yiannis Kompatsiaris,et al.  Local Invariant Feature Tracks for high-level video feature extraction , 2010, 11th International Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10.

[4]  Euripides G. M. Petrakis,et al.  Adaptive Methods for Motion Characterization and Segmentation of MPEG Compressed Frame Sequences , 2004, ICIAR.

[5]  Jean Ponce,et al.  A Tensor-Based Algorithm for High-Order Graph Matching , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Nikolas P. Galatsanos,et al.  Scene Detection in Videos Using Shot Clustering and Sequence Alignment , 2009, IEEE Transactions on Multimedia.

[7]  Vincent Lepetit,et al.  DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Tieniu Tan,et al.  A Novel Algorithm for View and Illumination Invariant Image Matching , 2012, IEEE Transactions on Image Processing.

[9]  Arnold W. M. Smeulders,et al.  Real-Time Visual Concept Classification , 2010, IEEE Transactions on Multimedia.

[10]  Yiannis Kompatsiaris,et al.  Temporal Video Segmentation to Scenes Using High-Level Audiovisual Features , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Guoliang Fan,et al.  Combined key-frame extraction and object-based video segmentation , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[13]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Hari Kalva,et al.  Accuracy and stability improvement of tomography video signatures , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[15]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.