Finding structure in home videos by probabilistic hierarchical clustering

Accessing, organizing, and manipulating home videos present technical challenges due to their unrestricted content and lack of storyline. We present a methodology to discover cluster structure in home videos, which uses video shots as the unit of organization, and is based on two concepts: (1) the development of statistical models of visual similarity, duration, and temporal adjacency of consumer video segments and (2) the reformulation of hierarchical clustering as a sequential binary Bayesian classification process. A Bayesian formulation allows for the incorporation of prior knowledge of the structure of home video and offers the advantages of a principled methodology. Gaussian mixture models are used to represent the class-conditional distributions of intra- and inter-segment visual and temporal features. The models are then used in the probabilistic clustering algorithm, where the merging order is a variation of highest confidence first, and the merging criterion is maximum a posteriori. The algorithm does not need any ad-hoc parameter determination. We present extensive results on a 10-h home-video database with ground truth which thoroughly validate the performance of our methodology with respect to cluster detection, individual shot-cluster labeling, and the effect of prior selection.

[1]  Rainer Lienhart,et al.  Abstracting home video automatically , 1999, MULTIMEDIA '99.

[2]  J. C. Platt AutoAlbum: clustering digital photographs using probabilistic model merging , 2000, 2000 Proceedings Workshop on Content-based Access of Image and Video Libraries.

[3]  Wei-Ying Ma,et al.  An indexing and browsing system for home video , 2000, 2000 10th European Signal Processing Conference.

[4]  Alexander C. Loui,et al.  Consumer video structuring by probabilistic merging of video segments , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[5]  Alexander C. Loui,et al.  A software system for automatic albuming of consumer pictures , 1999, MULTIMEDIA '99.

[6]  Joan E. Hart,et al.  Film Directing Shot by Shot: Visualizing from Concept to Screen , 1991 .

[7]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Ramin Zabih,et al.  Comparing images using joint histograms , 1999, Multimedia Systems.

[9]  Donald A. Adjeroh,et al.  On ratio-based color indexing , 2001, IEEE Trans. Image Process..

[10]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[11]  Shih-Fu Chang,et al.  Clustering methods for video browsing and annotation , 1996, Electronic Imaging.

[12]  Boon-Lock Yeo,et al.  Segmentation of Video by Clustering and Graph Analysis , 1998, Comput. Vis. Image Underst..

[13]  Steven D. Katz Film directing, shot by shot : visualizing from concept to screen , 1991 .

[14]  John R. Kender,et al.  On the structure and analysis of home videos , 2000 .

[15]  A. Lippman,et al.  A Bayesian video modeling framework for shot segmentation and content characterization , 1997, 1997 Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries.

[16]  Anil K. Jain,et al.  Image classification for content-based indexing , 2001, IEEE Trans. Image Process..

[17]  Giridharan Iyengar,et al.  Content-based browsing and editing of unstructured video , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[18]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[19]  Dorin Comaniciu,et al.  Real-time tracking of non-rigid objects using mean shift , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[20]  Li Zhao,et al.  Video shot grouping using best-first model merging , 2001, IS&T/SPIE Electronic Imaging.

[21]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[22]  Ullas Gargi,et al.  Performance characterization of video-shot-change detection methods , 2000, IEEE Trans. Circuits Syst. Video Technol..

[23]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[24]  Christopher M. Brown,et al.  The theory and practice of Bayesian image labeling , 1990, International Journal of Computer Vision.