A generalized multiple instance learning algorithm for large scale modeling of multimedia semantics

Statistical learning techniques provide a robust framework for learning representations of semantic concepts from multimedia features. The bottleneck is the number of training samples needed to construct robust models. This is particularly expensive when the annotation needs to happen at finer granularity. We present a novel approach where the annotations may be entered at coarser spatial granularity while the concept may still be learnt at finer granularity. This can speed up annotation significantly. Using the multiple instance learning paradigm, we show that it is possible to learn representations of concepts occurring at the regional level by using annotations for several images. We present a generalized multiple instance learning algorithm that can scale to a large number of training samples as well as a large number of instances per bag. The algorithm also provides the ability to plug in different density modeling or regression techniques. Using the TREC 2001 Corpus we demonstrate the superior performance of the proposed algorithm over the existing diverse density algorithm.

[1]  W. Eric L. Grimson,et al.  A framework for learning query concepts in image classification , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[4]  Brendan J. Frey,et al.  Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[5]  J.R. Smith,et al.  Learning visual models of semantic concepts , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[6]  Milind R. Naphade,et al.  Probabilistic Semantic Video Indexing , 2000, NIPS.