论文信息 - Automatic Video Annotation Using Multimodal Dirichlet Process Mixture Model

Automatic Video Annotation Using Multimodal Dirichlet Process Mixture Model

In this paper we infer a multimodal Dirichlet process mixture model from video data, the mixture components in this model follow a Gaussian-multinomial distribution. The multimodal Dirichlet process mixture model clusters freely available multimodal data in videos i.e., the combination of visual track and the corresponding keywords extracted from speech transcripts obtained from the audio track of videos, using the parameters of the model we build a predictive model that can output keyword annotations given video shots. In the multimodal Dirichlet process mixture model the keywords follow a multinomial distribution while the features used to represent the video shot follow a Gaussian distribution. We infer the multimodal Dirichlet process mixture model by collecting samples from the corresponding Markov chain using a blocked Gibbs sampling algorithm, and use the inferred parameters to predict video shot annotations that can be used to perform text based retrieval of shots. We compare the performance of our proposed model with other baseline models that use predicted annotations for retrieval.

Thomas S. Huang | Atulya Velivelli

[1] Atreyi Kankanhalli,et al. Automatic partitioning of full-motion video , 1993, Multimedia Systems.

[2] R. Manmatha,et al. Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[3] Makoto Miyahara,et al. Mathematical Transform Of (R, G, B) Color Data To Munsell (H, V, C) Color Data , 1988, Other Conferences.

[4] Lancelot F. James,et al. Approximate Dirichlet Process Computing in Finite Normal Mixtures , 2002 .

[5] J. Sethuraman. A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[6] Antonio Torralba,et al. Describing Visual Scenes using Transformed Dirichlet Processes , 2005, NIPS.

[7] Lancelot F. James,et al. Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[8] T. Ferguson. A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[9] David A. Forsyth,et al. Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[10] Michael I. Jordan,et al. Modeling annotated data , 2003, SIGIR.

[11] Michael I. Jordan,et al. Hierarchical Dirichlet Processes , 2006 .

[12] R. Manmatha,et al. Multiple Bernoulli relevance models for image and video annotation , 2004, CVPR 2004.