Towards Theoretical Performance Limits of Video Parsing

This paper unravels the problem of temporal video segmentation, or video parsing, and explores the possibilities for defining theoretical limits for the expected performance of a general parsing algorithm. In particular, we address the challenge of computing the coherence of video content, which is critical to the ability of an algorithm to parse a video automatically. If this coherence is difficult to extract from video data, it is unrealistic to expect that any parsing algorithm applied to that data will perform optimally with respect to the ground truth, independent of the features and approach used. The measure of coherence computability (CC) we introduce in this paper is derived from the average uncertainty in extracting the content-related information from data, which translates into the uncertainty for making a decision about boundary presence at a given time stamp of a video. We argue that the introduced CC measure is more powerful in revealing the true quality of a video parsing algorithm than the classical comparison of parsing results with the ground truth. We also discuss how this measure can be employed to characterize and compare video sequences in terms of the expected parsing performance, and to interpret and evaluate the obtained parsing results accordingly

[1]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[2]  Irena Koprinska,et al.  Temporal video segmentation: A survey , 2001, Signal Process. Image Commun..

[3]  Thomas S. Huang,et al.  Constructing table-of-content for videos , 1999, Multimedia Systems.

[4]  Lie Lu,et al.  Audio Elements Based Auditory Scene Segmentation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5]  John R. Kender,et al.  Video scene segmentation via continuous video coherence , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[6]  Alan Hanjalic,et al.  Automated high-level movie segmentation for advanced video-retrieval systems , 1999, IEEE Trans. Circuits Syst. Video Technol..

[7]  Boon-Lock Yeo,et al.  Time-constrained clustering for segmentation of video into story units , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[8]  Alan Hanjalic,et al.  Moving away from narrow-scope solutions in multimedia content analysis , 2005 .

[9]  Marcel Worring,et al.  Systematic evaluation of logical story unit segmentation , 2002, IEEE Trans. Multim..

[10]  Rainer Lienhart,et al.  Reliable Transition Detection in Videos: A Survey and Practitioner's Guide , 2001, Int. J. Image Graph..

[11]  Tat-Seng Chua,et al.  CINEMATIC-BASED MODEL FOR SCENE BOUNDARY DETECTION , 2001 .

[12]  Joseph M. Boggs The Art of Watching Films , 1978 .

[13]  Alan Hanjalic,et al.  Content-Based Analysis of Digital Video , 2004, Springer US.