Statistical models of video structure for content analysis and characterization

Content structure plays an important role in the understanding of video. In this paper, we argue that knowledge about structure can be used both as a means to improve the performance of content analysis and to extract features that convey semantic information about the content. We introduce statistical models for two important components of this structure, shot duration and activity, and demonstrate the usefulness of these models with two practical applications. First, we develop a Bayesian formulation for the shot segmentation problem that is shown to extend the standard thresholding model in an adaptive and intuitive way, leading to improved segmentation accuracy. Second, by applying the transformation into the shot duration/activity feature space to a database of movie clips, we also illustrate how the Bayesian model captures semantic properties of the content. We suggest ways in which these properties can be used as a basis for intuitive content-based access to movie libraries.

[1]  Nuno Vasconcelos,et al.  A Bayesian framework for semantic content characterization , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[2]  Brian V. Funt,et al.  Color Constant Color Indexing , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Jun Zhang,et al.  Maximum-likelihood parameter estimation for unsupervised stochastic model-based image segmentation , 1994, IEEE Trans. Image Process..

[4]  John R. Kender,et al.  Video scene segmentation via continuous video coherence , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[5]  Wallace Martin,et al.  Recent Theories of Narrative , 1986 .

[6]  P. J. Green,et al.  Probability and Statistical Inference , 1978 .

[7]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[8]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[9]  Karel Reisz The Technique of Film Editing , 1957 .

[10]  Yücel Altunbasak,et al.  Content-based video retrieval and compression: a unified solution , 1997, Proceedings of International Conference on Image Processing.

[11]  Yasuo Ariki,et al.  Extraction of TV news articles based on scene cut detection using DCT clustering , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[12]  David Bordwell,et al.  Film Art: An Introduction , 1979 .

[13]  Alvin W. Drake,et al.  Fundamentals of Applied Probability Theory , 1967 .

[14]  Ullas Gargi,et al.  Performance characterization and comparison of video indexing algorithms , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[15]  M. Kendall Probability and Statistical Inference , 1956, Nature.

[16]  Boon-Lock Yeo,et al.  Rapid scene analysis on compressed video , 1995, IEEE Trans. Circuits Syst. Video Technol..

[17]  Edward H. Adelson,et al.  Representing moving images with layers , 1994, IEEE Trans. Image Process..

[18]  Wei-Ying Ma,et al.  Benchmarking of image features for content-based retrieval , 1998, Conference Record of Thirty-Second Asilomar Conference on Signals, Systems and Computers (Cat. No.98CH36284).

[19]  Serge J. Belongie,et al.  Region-based image querying , 1997, 1997 Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries.

[20]  Alan Hanjalic,et al.  Template-based detection of anchorperson shots in news programs , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[21]  Jitendra Malik,et al.  Color- and texture-based image segmentation using EM and its application to content-based image retrieval , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[22]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[23]  Qi Tian,et al.  An automatic news video parsing, indexing and browsing system , 1997, MULTIMEDIA '96.

[24]  Touradj Ebrahimi,et al.  Video segmentation based on multiple features for interactive multimedia applications , 1998, IEEE Trans. Circuits Syst. Video Technol..

[25]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[26]  John S. Boreczky,et al.  Comparison of video shot boundary detection techniques , 1996, J. Electronic Imaging.

[27]  B. S. Manjunath,et al.  NeTra-V: toward an object-based video representation , 1997, Electronic Imaging.

[28]  Nuno Vasconcelos,et al.  Humane Interfaces to Video , 1998 .

[29]  Edmond Chalom,et al.  Statistical image sequence segmentation using multidimensional attributes , 1998 .

[30]  Yann LeCun,et al.  Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[31]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[32]  Nuno Vasconcelos,et al.  Empirical Bayesian EM-based motion segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Yair Weiss,et al.  Smoothness in layers: Motion segmentation using nonparametric mixture estimation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Christos Faloutsos,et al.  QBIC project: querying images by content, using color, texture, and shape , 1993, Electronic Imaging.

[35]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[36]  A. Murat Tekalp,et al.  Content-based video abstraction , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[37]  P. Anandan,et al.  Hierarchical Model-Based Motion Estimation , 1992, ECCV.

[38]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[39]  A. Lippman,et al.  A Bayesian video modeling framework for shot segmentation and content characterization , 1997, 1997 Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries.

[40]  A. Lippman,et al.  Human interfaces to video , 1998, Conference Record of Thirty-Second Asilomar Conference on Signals, Systems and Computers (Cat. No.98CH36284).

[41]  Ramin Zabih,et al.  Comparing images using color coherence vectors , 1997, MULTIMEDIA '96.

[42]  Stephen W. Smoliar,et al.  Content-based video browsing tools , 1995, Electronic Imaging.

[43]  Akio Nagasaka,et al.  Automatic Video Indexing and Full-Video Search for Object Appearances , 1991, VDB.

[44]  Giridharan Iyengar,et al.  Semantically controlled content-based retrieval of video sequences , 1998, Other Conferences.

[45]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[46]  Nuno Vasconcelos,et al.  Multiresolution Tangent Distance for Affine-invariant Classification , 1997, NIPS.

[47]  Montse Pardàs,et al.  Hierarchical morphological segmentation for image sequence coding , 1994, IEEE Trans. Image Process..

[48]  Martin Szummer,et al.  Indoor-outdoor image classification , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[49]  Stephen W. Smoliar,et al.  Video Indexing and Retrieval , 1996 .

[50]  Boon-Lock Yeo,et al.  Time-constrained clustering for segmentation of video into story units , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[51]  Robert G. Lambert How To Read a Movie. , 1966 .

[52]  Nuno Vasconcelos,et al.  Towards semantically meaningful feature spaces for the characterization of video content , 1997, Proceedings of International Conference on Image Processing.

[53]  Shih-Fu Chang,et al.  A fully automated content-based video search engine supporting spatiotemporal queries , 1998, IEEE Trans. Circuits Syst. Video Technol..

[54]  Shih-Fu Chang,et al.  VisualSEEk: a fully automated content-based image query system , 1997, MULTIMEDIA '96.

[55]  V. Michael Bove,et al.  Adding Hyperlinks to Digital Television , 1998 .

[56]  Anil K. Jain,et al.  Image retrieval using color and shape , 1996, Pattern Recognit..

[57]  Yihong Gong,et al.  Automatic parsing of news video , 1994, 1994 Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[58]  David A. Forsyth,et al.  Body plans , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.