Hybrid Learning Schemes for Multimedia Information Retrieval

Traditional database systems assume that precise query concepts can be specified by users (for example, by using query languages). For many search tasks, however, a query concept is hard to articulate, and articulation can be subjective. Most users would find it hard to describe an image or a music query in low-level perceptual features. We believe that one desirable paradigm for search engines is to mine (i.e., to learn) users' query concepts through active learning. In this paper, we formulate the query-concept learning problem as finding a binary classifier that separates relevant objects from those that are irrelevant to the query concept. We propose two hybrid algorithms, pipeline learning, and co-training, that are built on top of two active learning algorithms. Our empirical study shows that even when the feature dimension is very high and target concepts are very specific, the hybrid algorithms can grasp a complex query concept in a small number of user iterations.