Concept-oriented indexing of video databases: toward semantic sensitive retrieval and browsing

Digital video now plays an important role in medical education, health care, telemedicine and other medical applications. Several content-based video retrieval (CBVR) systems have been proposed in the past, but they still suffer from the following challenging problems: semantic gap, semantic video concept modeling, semantic video classification, and concept-oriented video database indexing and access. In this paper, we propose a novel framework to make some advances toward the final goal to solve these problems. Specifically, the framework includes: 1) a semantic-sensitive video content representation framework by using principal video shots to enhance the quality of features; 2) semantic video concept interpretation by using flexible mixture model to bridge the semantic gap; 3) a novel semantic video-classifier training framework by integrating feature selection, parameter estimation, and model selection seamlessly in a single algorithm; and 4) a concept-oriented video database organization technique through a certain domain-dependent concept hierarchy to enable semantic-sensitive video retrieval and browsing.

[1]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[2]  James Ze Wang,et al.  SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Zhu Liu,et al.  Integration of multimodal features for video scene classification based on HMM , 1999, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451).

[4]  Shih-Fu Chang,et al.  A fully automated content-based video search engine supporting spatiotemporal queries , 1998, IEEE Trans. Circuits Syst. Video Technol..

[5]  W. Arthur,et al.  WHAT IS A BODY PLAN , 1997 .

[6]  Chahab Nastar,et al.  Relevance feedback and category search in image databases , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[7]  Qi Tian,et al.  Discriminant-EM algorithm with application to image retrieval , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[8]  Amarnath Gupta,et al.  Virage video engine , 1997, Electronic Imaging.

[9]  JongWon Kim,et al.  SIVOG: smart interactive video object generation system , 1999, MULTIMEDIA '99.

[10]  Shih-Fu Chang,et al.  CVEPS - a compressed video editing and parsing system , 1997, MULTIMEDIA '96.

[11]  Shih-Fu Chang,et al.  Computable scenes and structures in films , 2002, IEEE Trans. Multim..

[12]  David B. Lomet,et al.  The hB-tree: a multiattribute indexing method with good guaranteed performance , 1990, TODS.

[13]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Charles A. Bouman,et al.  A compressed video database structured for active browsing and search , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[15]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[16]  Milind R. Naphade,et al.  A probabilistic framework for semantic video indexing, filtering, and retrieval , 2001, IEEE Trans. Multim..

[17]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[18]  Yanxi Liu,et al.  Classification Driven Semantic Based Medical Image Indexing and Retrieval , 1999 .

[19]  Shih-Fu Chang,et al.  Semantic visual templates: linking visual features to semantics , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[20]  A. Murat Tekalp,et al.  Temporal video segmentation using unsupervised clustering and semantic object tracking , 1998, J. Electronic Imaging.

[21]  Boon-Lock Yeo,et al.  Video visualization for compact presentation and fast browsing of pictorial content , 1997, IEEE Trans. Circuits Syst. Video Technol..

[22]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[23]  Jianping Fan,et al.  Automatic image segmentation by integrating color-edge extraction and seeded region growing , 2001, IEEE Trans. Image Process..

[24]  Shih-Fu Chang,et al.  MediaNet: a multimedia information network for knowledge representation , 2000, SPIE Optics East.

[25]  Pietro Perona,et al.  Towards automatic discovery of object categories , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[26]  Alfred O. Hero,et al.  Space-alternating generalized expectation-maximization algorithm , 1994, IEEE Trans. Signal Process..

[27]  Boon-Lock Yeo,et al.  Classification, simplification, and dynamic visualization of scene transition graphs for video browsing , 1997, Electronic Imaging.

[28]  Christian Böhm,et al.  Independent quantization: an index compression technique for high-dimensional data spaces , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[29]  Shih-Fu Chang,et al.  Clustering methods for video browsing and annotation , 1996, Electronic Imaging.

[30]  B. S. Manjunath,et al.  NeTra-V: toward an object-based video representation , 1997, Electronic Imaging.

[31]  Jonathan D. Courtney Automatic video indexing via object motion analysis , 1997, Pattern Recognit..

[32]  Jing Huang,et al.  An automatic hierarchical image classification scheme , 1998, MULTIMEDIA '98.

[33]  Alexander G. Hauptmann,et al.  Text, Speech, and Vision for Video Segmentation: The InformediaTM Project , 1995 .

[34]  Marcel Worring,et al.  Multimodal Video Indexing : A Review of the State-ofthe-art , 2001 .

[35]  Nuno Vasconcelos,et al.  A Bayesian framework for semantic content characterization , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[36]  Alexander Thomasian,et al.  Clustering and singular value decomposition for approximate indexing in high dimensional spaces , 1998, CIKM '98.

[37]  Ingemar J. Cox,et al.  Correction to "the Bayesian image retrieval system, pichunter: theory, implementation, and psychophysical experiments" , 2000, IEEE Transactions on Image Processing.

[38]  Levent Onural,et al.  Image sequence analysis for emerging interactive multimedia services-the European COST 211 framework , 1998, IEEE Trans. Circuits Syst. Video Technol..

[39]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[40]  K. Selçuk Candan,et al.  Hierarchical Image Modeling for Object-Based Media Retrieval , 1998, Data Knowl. Eng..

[41]  Joo-Hwee Lim Learnable visual keywords for image classification , 1999, DL '99.

[42]  Edward Y. Chang,et al.  Clindex: Clustering for Similarity Queries in High-Dimensional Spaces. , 1999 .

[43]  Wolfgang Effelsberg,et al.  Automatic recognition of film genres , 1995, MULTIMEDIA '95.

[44]  Christos Faloutsos,et al.  The TV-tree: An index structure for high-dimensional data , 1994, The VLDB Journal.

[45]  Charles A. Bouman,et al.  ViBE: a compressed video database structured for active browsing and search , 2004, IEEE Transactions on Multimedia.

[46]  W. Eric L. Grimson,et al.  Configuration based scene classification and image indexing , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[47]  Boon-Lock Yeo,et al.  Rapid scene analysis on compressed video , 1995, IEEE Trans. Circuits Syst. Video Technol..

[48]  Faouzi Kossentini,et al.  Automatic Key Video Object Plane Selection Using the Shape Information in the MPEG-4 Compressed Domain , 2000, IEEE Trans. Multim..

[49]  John P. Oakley,et al.  Storage and Retrieval for Image and Video Databases , 1993 .

[50]  Shaoping Ma,et al.  Relevance feedback in content-based image retrieval: Bayesian framework, feature subspaces, and progressive learning , 2003, IEEE Trans. Image Process..

[51]  Wei-Hao Lin,et al.  News video classification using SVM-based multimodal classifiers and combination strategies , 2002, MULTIMEDIA '02.

[52]  Thomas S. Huang,et al.  Constructing table-of-content for videos , 1999, Multimedia Systems.

[53]  C.-C. Jay Kuo,et al.  Rule-based video classification system for basketball video indexing , 2000, MULTIMEDIA '00.

[54]  Ali N. Akansu,et al.  Multi-Modal Dialog Scene Detection Using Hidden Markov Models for Content-Based Multimedia Indexing , 2001, Multimedia Tools and Applications.

[55]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[56]  Zhu Liu,et al.  Multimedia content analysis-using both audio and visual clues , 2000, IEEE Signal Process. Mag..

[57]  Patrick Bouthemy,et al.  Motion segmentation and qualitative dynamic scene analysis from an image sequence , 1993, International Journal of Computer Vision.

[58]  Chung-Ming Huang,et al.  MING-I: a distributed interactive multimedia document development mechanism , 1998, Multimedia Systems.

[59]  B. S. Manjunath,et al.  Adaptive nearest neighbor search for relevance feedback in large image databases , 2001, MULTIMEDIA '01.

[60]  James Ze Wang,et al.  SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[61]  Tom Minka,et al.  Interactive learning with a "society of models" , 1997, Pattern Recognit..

[62]  King Ngi Ngan,et al.  Automatic segmentation of moving objects for video object plane generation , 1998, IEEE Trans. Circuits Syst. Video Technol..

[63]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[64]  Anil K. Jain,et al.  Bayesian framework for semantic classification of outdoor vacation images , 1998, Electronic Imaging.

[65]  B. S. Manjunath,et al.  An efficient color representation for image retrieval , 2001, IEEE Trans. Image Process..

[66]  Shih-Fu Chang,et al.  IMKA: a multimedia organization system combining perceptual and semantic knowledge , 2001, MULTIMEDIA '01.

[67]  Ingemar J. Cox,et al.  The Bayesian image retrieval system, PicHunter: theory, implementation, and psychophysical experiments , 2000, IEEE Trans. Image Process..

[68]  Josef Pieprzyk,et al.  A Multi-Level View Model for Secure Object-Oriented Databases , 1997, Data Knowl. Eng..

[69]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[70]  Takeo Kanade,et al.  Name-It: association of face and name in video , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[71]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..

[72]  Atreyi Kankanhalli,et al.  Automatic partitioning of full-motion video , 1993, Multimedia Systems.

[73]  Zhu Liu,et al.  Classification TV programs based on audio information using hidden Markov model , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).

[74]  John R. Smith,et al.  Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues , 2003, EURASIP J. Adv. Signal Process..

[75]  Christos Faloutsos,et al.  MindReader: Querying Databases Through Multiple Examples , 1998, VLDB.

[76]  Tieniu Tan,et al.  Efficient image gradient based vehicle localization , 2000, IEEE Trans. Image Process..

[77]  John R. Smith,et al.  VideoZoom Spatio-Temporal Video Browser , 1999, IEEE Trans. Multim..

[78]  Edward Y. Chang,et al.  CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines , 2003, IEEE Trans. Circuits Syst. Video Technol..

[79]  Aidong Zhang,et al.  Semantic clustering and querying on heterogeneous features for visual data , 1998, MULTIMEDIA '98.

[80]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[81]  Stan Z. Li,et al.  Extraction of feature subspaces for content-based retrieval using relevance feedback , 2001, MULTIMEDIA '01.

[82]  Jianping Fan,et al.  Hierarchical video summarization for medical data , 2001, IS&T/SPIE Electronic Imaging.

[83]  Svetha Venkatesh,et al.  Towards automatic extraction of expressive elements from motion pictures: tempo , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[84]  Duen-Ren Liu,et al.  Classifying Video Documents by Hierarchical Structure of Video Contents , 2000, Comput. J..

[85]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[86]  David A. Forsyth,et al.  Body plans , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[87]  Svetha Venkatesh,et al.  Toward automatic extraction of expressive elements from motion pictures: tempo , 2002, IEEE Trans. Multim..

[88]  Wei Xiong,et al.  Query by video clip , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[89]  Anil K. Jain,et al.  Automatic classification of tennis video for high-level content-based retrieval , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[90]  Michael I. Jordan,et al.  On Convergence Properties of the EM Algorithm for Gaussian Mixtures , 1996, Neural Computation.

[91]  Ming-Chieh Lee,et al.  Semiautomatic segmentation and tracking of semantic video objects , 1998, IEEE Trans. Circuits Syst. Video Technol..

[92]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[93]  Jianping Fan,et al.  MultiView: Multilevel video content representation and retrieval , 2001, J. Electronic Imaging.

[94]  Ambuj K. Singh,et al.  Dimensionality reduction for similarity searching in dynamic databases , 1998, SIGMOD '98.

[95]  Jianping Fan,et al.  Automatic model-based semantic object extraction algorithm , 2001, IEEE Trans. Circuits Syst. Video Technol..