A statistical modeling approach to content based retrieval

Statistical modeling for content based retrieval is examined in the context of recent TREC Video benchmark exercise. The TREC Video exercise can be viewed as a test bed for evaluation and comparison of a variety of different algorithms on a set of high-level queries for multimedia retrieval. We report on the use of techniques adopted from statistical learning theory. Our method, as in most statistical methods, depend on training of models based on large data sets. A plethora of statistical models such the Gaussian mixture models, support vector machines etc. can be thought of, only a few of which are exploited in this preliminary report. Training requires a large amount of annotated (labeled) data. Thus, we explore use of active learning for the annotation engine that minimizes the number of training samples to be labeled for satisfactory performance.