We propose a fast maximum a posteriori (MAP) adaptation technique for a GMM-supervectors-based video semantic indexing system.The use of GMM supervectors is one of the state-of-the-art methods in which MAP adaptation is needed for estimating the distribution of local features extracted from video data. The proposed method cuts the calculation time of the MAP adaptation step. With the proposed method, a tree-structured GMM is constructed to quickly calculate posterior probabilities for each mixture component of a GMM. The basic idea of the tree-structured GMM is to cluster Gaussian components and approximate them with a single Gaussian. Leaf nodes of the tree correspond to the mixture components, and each non-leaf node has a single Gaussian that approximates its descendant Gaussian distributions. Experimental evaluation on the TRECVID 2010 dataset demonstrates the effectiveness of the proposed method. The calculation time of the MAP adaptation step is reduced by 76.2% compared to that of a conventional method and resulting accuracy (in terms of Mean average precision) was 10.2%.
[1]
Koichi Shinoda,et al.
High-Level Feature Extraction Using SIFT GMMs and Audio Models
,
2010,
2010 20th International Conference on Pattern Recognition.
[2]
Chih-Jen Lin,et al.
LIBSVM: A library for support vector machines
,
2011,
TIST.
[3]
David Haussler,et al.
Exploiting Generative Models in Discriminative Classifiers
,
1998,
NIPS.
[4]
Paul Over,et al.
Evaluation campaigns and TRECVid
,
2006,
MIR '06.
[5]
Douglas E. Sturim,et al.
Support vector machines using GMM supervectors for speaker verification
,
2006,
IEEE Signal Processing Letters.
[6]
Emine Yilmaz,et al.
A simple and efficient sampling method for estimating AP and NDCG
,
2008,
SIGIR '08.
[7]
Shuicheng Yan,et al.
SIFT-Bag kernel for video event analysis
,
2008,
ACM Multimedia.
[8]
Thomas S. Huang,et al.
Image Classification Using Super-Vector Coding of Local Image Descriptors
,
2010,
ECCV.