Exploring functionalities in the compressed image/video domain

Real-time image manipulation and indexing are critical in many multimedia applications. Imagine a typical task for a broadcast video journalist: he or she needs to find a few specific video clips from huge video collections (e.g., hundreds of thousand hours of videos), then edit them into the final presentation video within just a few minutes or even seconds. Today’s state-of-the-art video editing systems require dedicated, highend hardware. Real-time video manipulation on regular desktop computers is still not feasible. For video indexing/ search, a current research area, contentbased visual query [Niblack et al. 1993] explores new tools allowing users to find images efficiently from large databases based on visual content. Usually image features, such as object shape, color, texture, layout, motion, and spatial/ temporal structure, are extracted automatically or semi-automatically in the indexing stage, then used for image screening (similarity comparison) in the query stage. To achieve real-time performance, this paper proposes the compressed-domain approach for image/video manipulation and automatic visual content extraction. In most practical applications, images and videos are represented in compressed formats such as JPEG, MPEG, and Wavelet. There are significant advantages to doing visual content extraction and video manipulations directly in the compressed domain. First, compressed videos and images need not be decoded back to original uncompressed pixel format, which needs large storage space and processing power. The data rate is usually much lower in the compressed domain than the original uncompressed form (e.g., a 20/1 reduction in the JPEG domain). Second, if re-encoding of images/videos is not needed, visual quality can be maintained at the original level without the additional degradation introduced by the re-encoding process. Compressed-domain image technologies have attracted much interest recently, including our ongoing work [Chang 1995]. Manipulation functions such as geometrical transformation and video compositing have been developed in the discrete cosine transform (DCT)based compressed domain [Chang 1993; Smith and Rowe 1993] and MPEG-based compressed domain [Chang 1993]. Computation reduction up to 80?Z0 can be achieved for some manipulation functions such as scaling and picture-in-picture overlap. We have also demonstrated techniques for compressed-domain nonlinear video editing in Meng and Chang [1996]. Feature extraction from the compressed images/videos has been demonstrated for content-based query in Chang [1995], Yeo and Liu [1995], and Meng and Chang [1996]. There are challenging technical issues in designing compressed-domain functions. Each compression algorithm uses some specific syntax and data structure. For example, the JPEG image coding

[1]  Lawrence A. Rowe,et al.  Algorithms for manipulating compressed images , 1993, IEEE Computer Graphics and Applications.

[2]  Shih-Fu Chang,et al.  Tools for compressed-domain video indexing and editing , 1996, Electronic Imaging.

[3]  Christos Faloutsos,et al.  QBIC project: querying images by content, using color, texture, and shape , 1993, Electronic Imaging.

[4]  Boon-Lock Yeo,et al.  A unified approach to temporal segmentation of motion JPEG and MPEG compressed video , 1995, Proceedings of the International Conference on Multimedia Computing and Systems.

[5]  Shih-Fu Chang,et al.  Compressed-domain techniques for image/video indexing and manipulation , 1995, Proceedings., International Conference on Image Processing.