Survey of compressed-domain features used in audio-visual indexing and analysis

Abstract In this paper, we attempt to provide a comprehensive and high-level review of audio-visual features that can be extracted from the standard compressed domains, such as MPEG-1 and MPEG-2. The paper is motivated by the myriad of active research works in extraction and application of compressed-domain features in various fields, such as indexing, filtering, and manipulation. Compressed-domain approaches avoid expensive computation and memory requirements involved in decoding and/or re-encoding. Selected features are categorized into four groups—spatial visual (e.g., color, texture, edge, shape), motion (e.g., motion field, trajectory), audio (e.g., energy, spectral features, pitch), and coding (e.g., bit rate, frame/block type). For each feature, we briefly discuss the extraction methods, computational complexity, potential effectiveness in applications, and possible limitations caused by compress-domain approaches. Finally, we also discuss the possibilities of extracting some important MPEG-7 visual and audio descriptors directly from the compressed domain.

[1]  Dragutin Petkovic,et al.  Towards robust features for classifying audio in the CueVideo system , 1999, MULTIMEDIA '99.

[2]  I. K. Sethi,et al.  Convolution-Based Edge Detection for Image/Video in Block DCT Domain , 1996, J. Vis. Commun. Image Represent..

[3]  Bo Shen,et al.  Direct feature extraction from compressed images , 1996, Electronic Imaging.

[4]  Ramin Zabih,et al.  A feature-based algorithm for detecting and classifying scene breaks , 1995, MULTIMEDIA '95.

[5]  Boon-Lock Yeo,et al.  Rapid scene analysis on compressed video , 1995, IEEE Trans. Circuits Syst. Video Technol..

[6]  Warnakulasuriya Anil Chandana Fernando,et al.  Video segmentation and classification for content-based storage and retrieval using motion vectors , 1998, Electronic Imaging.

[7]  Ames SteetCambridge Recognizing Movement Using Motion Histograms , 1999 .

[8]  Christos Faloutsos,et al.  Compressed-domain video indexing techniques using DCT and motion vector information in MPEG video , 1997, Electronic Imaging.

[9]  Christos Faloutsos,et al.  VideoTrails: representing and visualizing structure in video sequences , 1997, MULTIMEDIA '97.

[10]  A. Murat Tekalp,et al.  Robust methods for high-quality stills from interlaced video in the presence of dominant motion , 1997, IEEE Trans. Circuits Syst. Video Technol..

[11]  Shih-Fu Chang,et al.  CVEPS - a compressed video editing and parsing system , 1997, MULTIMEDIA '96.

[12]  Wayne H. Wolf,et al.  Key frame selection by motion analysis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[13]  Yo-Sung Ho,et al.  Classified transform coding of images using vector quantization , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[14]  Hideo Hashimoto,et al.  Video indexing using motion vectors , 1992, Other Conferences.

[15]  Edoardo Ardizzone,et al.  Video indexing using MPEG motion compensation vectors , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[16]  C.-C. Jay Kuo,et al.  Hierarchical classification of audio data for archiving and retrieving , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[17]  Charles A. Bouman,et al.  ViBE: a new paradigm for video database browsing and search , 1998, Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173).

[18]  C.-C. Jay Kuo,et al.  Heuristic approach for generic audio data segmentation and annotation , 1999, MULTIMEDIA '99.

[19]  Anil K. Jain,et al.  Automatic caption localization in compressed video , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[20]  Ling Guan,et al.  Compressed-domain video parsing using energy histograms of the lower-frequency DCT coefficients , 1999, Electronic Imaging.

[21]  Anoop Gupta,et al.  Auto-summarization of audio-video presentations , 1999, MULTIMEDIA '99.

[22]  Brendan J. Frey,et al.  Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[23]  Atreyi Kankanhalli,et al.  Automatic partitioning of full-motion video , 1993, Multimedia Systems.

[24]  Shih-Fu Chang,et al.  Scene change detection in an MPEG-compressed video sequence , 1995, Electronic Imaging.

[25]  Chee Sun Won,et al.  Efficient color feature extraction in compressed video , 1998, Electronic Imaging.

[26]  Yukinobu Taniguchi,et al.  PanoramaExcerpts: extracting and packing panoramas for video browsing , 1997, MULTIMEDIA '97.

[27]  Forouzan Golshani,et al.  Motion recovery for video content classification , 1995, TOIS.

[28]  M. La Cascia,et al.  Motion and color-based video indexing and retrieval , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[29]  Sanjeev R. Kulkarni,et al.  A new method for camera motion parameter estimation , 1995, Proceedings., International Conference on Image Processing.

[30]  P. Anandan,et al.  Hierarchical Model-Based Motion Estimation , 1992, ECCV.

[31]  Milind R. Naphade,et al.  A probabilistic framework for semantic indexing and retrieval in video , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[32]  David S. Doermann,et al.  Special-effect edit detection using VideoTrails: a comparison with existing techniques , 1998, Electronic Imaging.

[33]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[34]  Hiroshi Ito,et al.  Scene change detection and feature extraction for MPEG-4 sequences , 1998, Electronic Imaging.

[35]  Arun N. Netravali,et al.  Digital Video: An introduction to MPEG-2 , 1996 .

[36]  Sanjeev R. Kulkarni,et al.  A framework for measuring video similarity and its application to video query by example , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[37]  Arding Hsu,et al.  Image processing on compressed data for large video databases , 1993, MULTIMEDIA '93.

[38]  John S. Boreczky,et al.  Comparison of video shot boundary detection techniques , 1996, J. Electronic Imaging.

[39]  Sanjeev R. Kulkarni,et al.  Automated analysis and annotation of basketball video , 1997, Electronic Imaging.

[40]  Byung Cheol Song,et al.  Fast edge map extraction from MPEG compressed video data for video parsing , 1998, Electronic Imaging.

[41]  David S. Doermann,et al.  Detection of slow-motion replay sequences for identifying sports videos , 1999, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451).

[42]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[43]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[44]  Seungwook Hong,et al.  An efficient video segmentation scheme for MPEG video stream using macroblock information , 1999, MULTIMEDIA '99.

[45]  Yoshinori Sakai,et al.  Reliability metric of motion vectors and its application to motion estimation , 1995, Other Conferences.

[46]  Kai-Kuang Ma,et al.  Motion trajectory extraction based on macroblock motion vectors for video indexing , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[47]  Ajay Divakaran,et al.  Descriptor for spatial distribution of motion activity for compressed video , 1999, Electronic Imaging.

[48]  Richard L. Baker,et al.  Camera zoom/pan estimation and compensation for video compression , 1991, Electronic Imaging.

[49]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[50]  Tsuhan Chen,et al.  Multimedia content classification using motion and audio information , 1997, Proceedings of 1997 IEEE International Symposium on Circuits and Systems. Circuits and Systems in the Information Age ISCAS '97.

[51]  Shih-Fu Chang,et al.  Determining computable scenes in films and their structures using audio-visual memory models , 2000, ACM Multimedia.

[52]  Chong-Wah Ngo,et al.  Motion characterization by temporal slices analysis , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[53]  Jian Feng,et al.  Scene change detection algorithm for MPEG video sequence , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[54]  Nilesh V. Patel,et al.  Video shot detection and characterization for video databases , 1997, Pattern Recognit..

[55]  Gennaro Percannella,et al.  Algorithm for video cut detection in MPEG sequences , 1999, Electronic Imaging.

[56]  Chong-Wah Ngo,et al.  On clustering and retrieval of video shots , 2001, MULTIMEDIA '01.

[57]  Sanjeev R. Kulkarni,et al.  Rapid estimation of camera motion from compressed video with application to video annotation , 2000, IEEE Trans. Circuits Syst. Video Technol..

[58]  Marina Bosi,et al.  MPEG-1 Audio , 2003 .

[59]  Rainer Lienhart,et al.  Comparison of automatic shot boundary detection algorithms , 1998, Electronic Imaging.

[60]  Stephen W. Smoliar,et al.  Video parsing and browsing using compressed data , 1995, Multimedia Tools and Applications.

[61]  Nilesh V. Patel,et al.  Audio characterization for video indexing , 1996, Electronic Imaging.

[62]  Edward J. Delp,et al.  A fast algorithm for video parsing using MPEG compressed sequences , 1995, Proceedings., International Conference on Image Processing.

[63]  Ralph M. Ford Quantitative comparison of shot boundary detection metrics , 1998, Electronic Imaging.

[64]  David Doermann,et al.  Archiving, indexing, and retrieval of video in the compressed domain , 1996, Other Conferences.

[65]  Shih-Fu Chang,et al.  Manipulation and Compositing of MC-DCT Compressed Video , 1995, IEEE J. Sel. Areas Commun..

[66]  Zhu Liu,et al.  Integration of audio and visual information for content-based video segmentation , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).