Undirected Graphical Models for Video Analysis and Classification

Accurate and efficient video classification and retrieval demands the fusion of multimodal information and the use of intermediate representations. This paper describes an undirected graphical model based on exponential-family harmonium, which derives intermediate semantic representations of video data by jointly modeling the textual and image information in the video. We propose an extension of the model to derive category-specific video representation and integrate video classification as a part of the modeling process. We report satisfactory classification performance on a set of 15 video categories from TRECVID collection as well as comparison on the effectiveness of different inference algorithms.