ViBE: a compressed video database structured for active browsing and search

In this paper, we describe a unique new paradigm for video database management known as ViBE (video indexing and browsing environment). ViBE is a browseable/searchable paradigm for organizing video data containing a large number of sequences. The system first segments video sequences into shots by using a new feature vector known as the Generalized Trace obtained from the DC-sequence of the compressed data. Each video shot is then represented by a hierarchical structure known as the shot tree. The shots are then classified into pseudo-semantic classes that describe the shot content. Finally, the results are presented to the user in an active browsing environment using a similarity pyramid data structure. The similarity pyramid allows the user to view the video database at various levels of detail. The user can also define semantic classes and reorganize the browsing environment based on relevance feedback. We describe how ViBE performs on a database of MPEG sequences.

[1]  Shih-Fu Chang,et al.  Spatio-temporal video search using the object based video representation , 1997, Proceedings of International Conference on Image Processing.

[2]  Howard D. Wactlar,et al.  Informedia - Search and Summarization in the Video Medium , 2000 .

[3]  M. Ibrahim Sezan,et al.  A semantic event-detection approach and its application to detecting hunts in wildlife vide , 2000, IEEE Trans. Circuits Syst. Video Technol..

[4]  Yueting Zhuang,et al.  Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[5]  Nilesh V. Patel,et al.  Statistical approach to scene change detection , 1995, Electronic Imaging.

[6]  John S. Boreczky,et al.  Comparison of video shot boundary detection techniques , 1996, J. Electronic Imaging.

[7]  Martin Szummer,et al.  Indoor-outdoor image classification , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[8]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[10]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..

[11]  Edward J. Delp,et al.  Video scene change detection using the generalized sequence trace , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[12]  Young-Min Kim,et al.  Fast Scene Change Detection using Direct Feature Extraction from MPEG Compressed Videos , 2000, IEEE Trans. Multim..

[13]  Soo-Chang Pei,et al.  Efficient MPEG Compressed Video Analysis Using Macroblock Type Information , 1999, IEEE Trans. Multim..

[14]  John C. Dalton,et al.  Hierarchical browsing and search of large image databases , 2000, IEEE Trans. Image Process..

[15]  Georgios Tziritas,et al.  Face Detection Using Quantized Skin Color Regions Merging and Wavelet Packet Analysis , 1999, IEEE Trans. Multim..

[16]  C.A. Bouman,et al.  Active browsing with similarity pyramids , 1998, Conference Record of Thirty-Second Asilomar Conference on Signals, Systems and Computers (Cat. No.98CH36284).

[17]  John C. Dalton,et al.  Active browsing using similarity pyramids , 1998, Electronic Imaging.

[18]  Shih-Fu Chang,et al.  Scene change detection in an MPEG-compressed video sequence , 1995, Electronic Imaging.

[19]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[20]  Wei Xiong,et al.  Query by video clip , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[21]  Jan P. Allebach,et al.  Optimal image scaling using pixel classification , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[22]  Charles A. Bouman,et al.  ViBE: a video indexing and browsing environment , 1999, Optics East.

[23]  John S. Boreczky,et al.  A hidden Markov model framework for video segmentation using audio and image features , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[24]  Ullas Gargi,et al.  Performance characterization of video-shot-change detection methods , 2000, IEEE Trans. Circuits Syst. Video Technol..

[25]  A. Murat Tekalp,et al.  Hierarchical temporal video segmentation and content characterization , 1997, Other Conferences.

[26]  Charles A. Bouman,et al.  Face detection for pseudo-semantic labeling in video databases , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[27]  Ramesh C. Jain,et al.  Knowledge-guided parsing in video databases , 1993, Electronic Imaging.

[28]  Hui Cheng,et al.  Multiscale Bayesian segmentation using a trainable context model , 2001, IEEE Trans. Image Process..

[29]  Alan F. Smeaton,et al.  Evaluation of automatic shot boundary detection on a large video test suite , 1999 .

[30]  Zhu Liu,et al.  Joint video scene segmentation and classification based on hidden Markov model , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[31]  Stephan Fischer Automatic violence detection in digital movies , 1996, Other Conferences.

[32]  Alex Pentland,et al.  Probabilistic visual learning for object detection , 1995, Proceedings of IEEE International Conference on Computer Vision.

[33]  Thomas S. Huang,et al.  Relevance feedback techniques in interactive content-based image retrieval , 1997, Electronic Imaging.

[34]  Shih-Fu Chang,et al.  A highly efficient system for automatic face region detection in MPEG video , 1997, IEEE Trans. Circuits Syst. Video Technol..

[35]  Akio Nagasaka,et al.  Automatic Video Indexing and Full-Video Search for Object Appearances , 1991, VDB.

[36]  Michael C. Burl,et al.  Finding faces in cluttered scenes using random labeled graph matching , 1995, Proceedings of IEEE International Conference on Computer Vision.

[37]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[38]  Shaogang Gong,et al.  Tracking and segmenting people in varying lighting conditions using colour , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[39]  Charles A. Bouman,et al.  A multiscale random field model for Bayesian image segmentation , 1994, IEEE Trans. Image Process..

[40]  Konstantin Y. Kupeev,et al.  Algorithm for efficient segmentation and selection of representative frames in video sequences , 2001, IS&T/SPIE Electronic Imaging.

[41]  Milind R. Naphade,et al.  Novel scheme for fast and efficent video sequence matching using compact signatures , 1999, Electronic Imaging.

[42]  Seungwook Hong,et al.  An efficient video segmentation scheme for MPEG video stream using macroblock information , 1999, MULTIMEDIA '99.

[43]  Minerva M. Yeung,et al.  Efficient matching and clustering of video shots , 1995, Proceedings., International Conference on Image Processing.

[44]  D. Rubin,et al.  Estimation and Hypothesis Testing in Finite Mixture Models , 1985 .

[45]  Edoardo Ardizzone,et al.  Multifeature image and video content-based storage and retrieval , 1996, Other Conferences.

[46]  Tomaso A. Poggio,et al.  Example-Based Learning for View-Based Human Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Edward J. Delp,et al.  A fast algorithm for video parsing using MPEG compressed sequences , 1995, Proceedings., International Conference on Image Processing.

[48]  Behzad Shahraray,et al.  Scene change detection and content-based sampling of video sequences , 1995, Electronic Imaging.

[49]  Huitao Luo,et al.  On face detection in the compressed domain , 2000, ACM Multimedia.

[50]  Sethuraman Panchanathan,et al.  Indexing of compressed video sequences , 1996, Electronic Imaging.

[51]  Narendra Ahuja,et al.  Detecting human faces in color images , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[52]  Marco La Cascia,et al.  Image Digestion and Relevance Feedback in the ImageRover WWW Search Engine , 1997 .

[53]  A. Murat Tekalp,et al.  Multiscale content extraction and representation for video indexing , 1997, Other Conferences.

[54]  A. Murat Tekalp,et al.  Content-based video abstraction , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[55]  Thomas S. Huang,et al.  Human face detection in a complex background , 1994, Pattern Recognit..

[56]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[57]  Takeo Kanade,et al.  Human Face Detection in Visual Scenes , 1995, NIPS.

[58]  Charles A. Bouman,et al.  ViBE video database system: an update and further studies , 1999, Electronic Imaging.

[59]  Behzad Shahraray,et al.  Automatic generation of pictorial transcripts of video programs , 1995, Electronic Imaging.

[60]  Ramin Zabih,et al.  A feature-based algorithm for detecting and classifying scene breaks , 1995, MULTIMEDIA '95.

[61]  Boon-Lock Yeo,et al.  Rapid scene analysis on compressed video , 1995, IEEE Trans. Circuits Syst. Video Technol..

[62]  Anil K. Jain,et al.  Bayesian framework for semantic classification of outdoor vacation images , 1998, Electronic Imaging.

[63]  Verónica Vilaplana,et al.  Region-based segmentation and tracking of human faces , 1998, 9th European Signal Processing Conference (EUSIPCO 1998).

[64]  Charles A. Bouman,et al.  A compressed video database structured for active browsing and search , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[65]  Boon-Lock Yeo,et al.  Video visualization for compact presentation and fast browsing of pictorial content , 1997, IEEE Trans. Circuits Syst. Video Technol..

[66]  Jan P. Allebach,et al.  Fast image database search using tree-structured VQ , 1997, Proceedings of International Conference on Image Processing.

[67]  Rainer Lienhart,et al.  Comparison of automatic shot boundary detection algorithms , 1998, Electronic Imaging.

[68]  Bo Shen,et al.  CUT DETECTION VIA COMPRESSED DOMAIN EDGE EXTRACTION , .

[69]  Alan F. Smeaton,et al.  The Fischlar Digital Video Recording, Analysis and Browsing System , 2000, RIAO.

[70]  Milind R. Naphade,et al.  Stochastic modeling of soundtrack for efficient segmentation and indexing of video , 1999, Electronic Imaging.

[71]  B. Davies,et al.  Video document , 1999, Optics East.

[72]  Charles A. Bouman,et al.  ViBE: a new paradigm for video database browsing and search , 1998, Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173).

[73]  Philippe Aigrain,et al.  The automatic real-time analysis of film editing and transition effects and its applications , 1994, Comput. Graph..

[74]  Ramesh C. Jain,et al.  Digital video segmentation , 1994, MULTIMEDIA '94.

[75]  Boon-Lock Yeo,et al.  Extracting story units from long programs for video browsing and navigation , 1996, Proceedings of the Third IEEE International Conference on Multimedia Computing and Systems.

[76]  James Monaco,et al.  How to read a film : the art, technology, language, history, and theory of film and media , 1978 .

[77]  Mark S. Drew,et al.  Video keyframe production by efficient clustering of compressed chromaticity signatures (poster session) , 2000, ACM Multimedia.

[78]  Michael S. Lew,et al.  Information theoretic view-based and modular face detection , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[79]  Thomas S. Huang,et al.  Browsing and retrieving video content in a unified framework , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).

[80]  Nilesh V. Patel,et al.  Video shot detection and characterization for video databases , 1997, Pattern Recognit..

[81]  Shih-Fu Chang,et al.  A fully automated content-based video search engine supporting spatiotemporal queries , 1998, IEEE Trans. Circuits Syst. Video Technol..

[82]  Paul England,et al.  Comparison of automatic video segmentation algorithms , 1996, Other Conferences.

[83]  Young-Min Kim,et al.  Fast scene change detection using direct feature extraction from MPEG compressed videos , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[84]  Amarnath Gupta,et al.  Virage video engine , 1997, Electronic Imaging.

[85]  J. Wang,et al.  Improving color based video shot detection , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[86]  Alan Hanjalic,et al.  DANCERS: Delft advanced news retrieval system , 2001, IS&T/SPIE Electronic Imaging.

[87]  John C. Dalton,et al.  Similarity pyramids for browsing and organization of large image databases , 1998, Electronic Imaging.

[88]  Thomas S. Huang,et al.  Face detection with information-based maximum discrimination , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[89]  Roberto Cipolla,et al.  Feature-based human face detection , 1997, Image Vis. Comput..

[90]  Wei Xiong,et al.  Query by video clip , 1999, Multimedia Systems.

[91]  Shih-Fu Chang,et al.  Conceptual framework for indexing visual information at multiple levels , 1999, Electronic Imaging.

[92]  Jan P. Allebach,et al.  Multiscale branch-and-bound image database search , 1997, Electronic Imaging.

[93]  Ingemar J. Cox,et al.  PicHunter: Bayesian relevance feedback for image retrieval , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[94]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[95]  Simone Santini,et al.  Beyond query by example , 1998, MULTIMEDIA '98.

[96]  Milind R. Naphade,et al.  A probabilistic framework for semantic indexing and retrieval in video , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[97]  Yihong Gong,et al.  Automatic parsing of news video , 1994, 1994 Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[98]  Dragutin Petkovic,et al.  "What is in that Video Anyway?" In Search of Better Browsing , 1999, ICMCS, Vol. 1.

[99]  Yihong Gong,et al.  Automatic parsing and indexing of news video , 1995, Multimedia Systems.

[100]  Edward J. Delp,et al.  An iterative growing and pruning algorithm for classification tree design , 1989, Conference Proceedings., IEEE International Conference on Systems, Man and Cybernetics.

[101]  A. Murat Tekalp,et al.  Efficient Filtering and Clustering Methods for Temporal Video Segmentation and Visual Summarization , 1998, J. Vis. Commun. Image Represent..

[102]  Dragutin Petkovic,et al.  The query by image content (QBIC) system , 1995, SIGMOD '95.

[103]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[104]  Gilles Burel,et al.  Detection and localization of faces on digital images , 1994, Pattern Recognit. Lett..

[105]  Venu Govindaraju,et al.  A computational model for face location , 1990, [1990] Proceedings Third International Conference on Computer Vision.

[106]  A. Lippman,et al.  A Bayesian video modeling framework for shot segmentation and content characterization , 1997, 1997 Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries.

[107]  Noel E. O'Connor,et al.  Description schemes for video programs, users and devices , 2000, Signal Process. Image Commun..

[108]  Wayne H. Wolf,et al.  Key frame selection by motion analysis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[109]  Konstantinos N. Plataniotis,et al.  A color segmentation and classification scheme for facial image and video retrieval extended summary , 1998, 9th European Signal Processing Conference (EUSIPCO 1998).

[110]  Nuno Vasconcelos,et al.  Towards semantically meaningful feature spaces for the characterization of video content , 1997, Proceedings of International Conference on Image Processing.