Statistical learning techniques provide a robust framework for learning representations of semantic concepts from multimedia features. The bottleneck is the number of training samples needed to construct robust models. This is particularly expensive when the annotation needs to happen at finer granularity. We present a novel approach where the annotations may be entered at coarser spatial granularity while the concept may still be learnt at finer granularity. This can speed up annotation significantly. Using the multiple instance learning paradigm, we show that it is possible to learn representations of concepts occurring at the regional level by using annotations for several images. We present a generalized multiple instance learning algorithm with three variations in the strategy to select the most likely positive instance from a positively annotated bag. Furthermore, we show how the three strategies can be combined to improve upon any single strategy and demonstrate 15% performance improvement over any single strategy using a few regional semantic concepts from the TRECVID 2003 benchmark corpus.
[1]
Qi Zhang,et al.
EM-DD: An Improved Multiple-Instance Learning Technique
,
2001,
NIPS.
[2]
Tomás Lozano-Pérez,et al.
A Framework for Multiple-Instance Learning
,
1997,
NIPS.
[3]
John R. Smith,et al.
A generalized multiple instance learning algorithm for large scale modeling of multimedia semantics
,
2005,
Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..
[4]
W. Eric L. Grimson,et al.
A framework for learning query concepts in image classification
,
1999,
Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).