A Generalized Multiple Instance Learning Algorithm for Iterative Distillation and Cross-Granular Propagation of Video Annotations

Video annotation is an expensive but necessary task for most vision and learning problems that require building models of visual semantics. This annotation gets prohibitively expensive especially when annotation has to happen at finer grained levels of regions in the videos. One way around the finer grained annotation dilemma is to support annotation at coarser granularity and then propagate this annotation to the finer granularity in a concept-dependent way. In this paper we propose a new generalized multiple instance learning algorithm that can work with any underlying density modeling techniques, and help propagate semantic concepts provided at the coarse granularity of video key-frames to finer grained regions. Our experiments on the NIST TRECVID common annotation corpus reveal improvement in annotation propagation accuracy between 3% to a dramatic 161%.

[1]  W. Eric L. Grimson,et al.  A framework for learning query concepts in image classification , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[2]  John R. Smith,et al.  A generalized multiple instance learning algorithm for large scale modeling of multimedia semantics , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[3]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[4]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[5]  Pietro Perona,et al.  Pruning training sets for learning of object categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[7]  Oded Maron,et al.  Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.