论文信息 - TokyoTech+Canon at TRECVID 2011

TokyoTech+Canon at TRECVID 2011

The aim of this section is to develop a high-performance semantic indexing system using Gaussian mixture model (GMM) supervectors and tree-structured GMMs [1, 2]. GMM spervectors corresponding to six types of audio and visual features are extracted from video shots by using tree-structured GMMs. The computational cost of maximum a posteriori (MAP) adaptation for estimating GMM parameters are reduced by tree-structured GMMs by keeping accuracy at high levels. Our best result was 17.3 % in terms of Mean InfAP, which was ranked 1st over all semantic indexing runs in the full task.

[1] Matti Pietikäinen,et al. A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..

[2] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[3] Berthold K. P. Horn,et al. Determining Optical Flow , 1981, Other Conferences.

[4] Andrea Vedaldi,et al. Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[5] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[6] Koichi Shinoda,et al. High-Level Feature Extraction Using SIFT GMMs and Audio Models , 2010, 2010 20th International Conference on Pattern Recognition.

[7] Ivan Laptev,et al. On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[8] Cordelia Schmid,et al. Coloring Local Feature Extraction , 2006, ECCV.

[9] Steve Young,et al. The HTK book version 3.4 , 2006 .

[10] Michael Isard,et al. ICONDENSATION: Unifying Low-Level and High-Level Tracking in a Stochastic Framework , 1998, ECCV.

[11] Cordelia Schmid,et al. Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[12] Steve Young,et al. The HTK book , 1995 .

[13] Koichi Shinoda,et al. A fast MAP adaptation technique for gmm-supervector-based video semantic indexing systems , 2011, ACM Multimedia.

[14] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).