Semantic annotation of multimedia using maximum entropy models

In this paper, we propose a maximum entropy based approach for automatic annotation of multimedia content. In our approach, we explicitly model the spatial-location of the low-level features by means of specially designed predicates. In addition, the interaction between the low-level features is modeled using joint observation predicates. We evaluate the performance of semantic concept classifiers built using this approach on the TRECVID2003 corpus. Experiments indicate that our model performance is on par with the best results reported to-date on this dataset; despite using only unimodal features and a single approach towards model-building. This compares favorably with the state-of-the-art systems which use multimodal features and classifier fusion to achieve similar results on this corpus.