TokyoTech+Canon at TRECVID 2011

The aim of this section is to develop a high-performance semantic indexing system using Gaussian mixture model (GMM) supervectors and tree-structured GMMs [1, 2]. GMM spervectors corresponding to six types of audio and visual features are extracted from video shots by using tree-structured GMMs. The computational cost of maximum a posteriori (MAP) adaptation for estimating GMM parameters are reduced by tree-structured GMMs by keeping accuracy at high levels. Our best result was 17.3 % in terms of Mean InfAP, which was ranked 1st over all semantic indexing runs in the full task.

[1]  Matti Pietikäinen,et al.  A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..

[2]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[3]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[4]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[5]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[6]  Koichi Shinoda,et al.  High-Level Feature Extraction Using SIFT GMMs and Audio Models , 2010, 2010 20th International Conference on Pattern Recognition.

[7]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[8]  Cordelia Schmid,et al.  Coloring Local Feature Extraction , 2006, ECCV.

[9]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[10]  Michael Isard,et al.  ICONDENSATION: Unifying Low-Level and High-Level Tracking in a Stochastic Framework , 1998, ECCV.

[11]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[12]  Steve Young,et al.  The HTK book , 1995 .

[13]  Koichi Shinoda,et al.  A fast MAP adaptation technique for gmm-supervector-based video semantic indexing systems , 2011, ACM Multimedia.

[14]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).