Multimedia Topic Models Considering Burstiness of Local Features

SUMMARY A number of studies have been conducted on topic modeling for various types of data, including text and image data. We focus particularly on the burstiness of the local features in modeling topics within video data in this paper. Burstiness is a phenomenon that is often discussed for text data. The idea is that if a word is used once in a document, it is more likely to be used again within the document. It is also observed in video data; for example, an object or visual word in video data is more likely to appear repeatedly within the same video data. Based on the idea mentioned above, we propose a new topic model, the Correspondence Dirichlet Compound Multinomial LDA (Corr-DCMLDA), which takes into account the burstiness of the local features in video data. The unknown parameters and latent variables in the model are estimated by conducting a collapsed Gibbs sampling and the hyperparameters are estimated by focusing on the fixed-point iterations. We demonstrate through experimentation on the genre classification of social video data that our model works more effectively than several baselines.

[1]  Andrew McCallum,et al.  Organizing the OCA: learning faceted subjects from a library of digital books , 2007, JCDL '07.

[2]  Shaogang Gong,et al.  A Markov Clustering Topic Model for mining behaviour in video , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Ruslan Salakhutdinov,et al.  Evaluation methods for topic models , 2009, ICML '09.

[6]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[7]  W. Bruce Croft,et al.  The INQUERY Retrieval System , 1992, DEXA.

[8]  W. Eric L. Grimson,et al.  Unsupervised Activity Perception by Hierarchical Bayesian Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Fabrice Souvannavong,et al.  Latent semantic analysis for an effective region-based video shot retrieval system , 2004, MIR '04.

[10]  T. Minka Estimating a Dirichlet distribution , 2012 .

[11]  Charles Elkan,et al.  Accounting for burstiness in topic models , 2009, ICML '09.

[12]  Eric P. Xing,et al.  MedLDA: maximum margin supervised topic models for regression and classification , 2009, ICML '09.

[13]  Chenliang Xu,et al.  A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Christoph H. Lampert,et al.  Topic models for semantics-preserving video compression , 2010, MIR '10.

[15]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[16]  Mohammad Soleymani,et al.  Automatic tagging and geotagging in video collections and communities , 2011, ICMR.

[17]  Ning Chen,et al.  Gibbs Max-Margin Topic Models with Fast Sampling Algorithms , 2013, ICML.

[18]  Chong Wang,et al.  Simultaneous image classification and annotation , 2009, CVPR.

[19]  Eric P. Xing,et al.  Symmetric Correspondence Topic Models for Multilingual Text Analysis , 2012, NIPS.

[20]  Thomas Sikora,et al.  Feature-based video key frame extraction for low quality video sequences , 2009, 2009 10th Workshop on Image Analysis for Multimedia Interactive Services.

[21]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[22]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[23]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).