Learning optimal visual features from Web sampling in online image retrieval

Linear discriminant analysis (LDA) to improve a Web images retrieval system. Our work takes place in the official European ImagEVAL 2006 campaign evaluation. The task consists to retrieve Web images using both textual (Web pages) and visual information. Our visual features integrate subband entropy profile, usual mean and color standard deviation. A simple weighted norm fusion is done with standard tf-idf Web page text analysis. Our model is the second best model of the ImagEVAL task2. We show how, sampling online image sets from the Web, one can estimate by approximated Fisher criterion an optimal visual feature subsets for some query concepts and then enhance their mean average precision by 50%. We discuss on the fact that some concept may not so nicely be enhanced, but that in average, this optimization reduces by 10 the visual dimension, without any MAP degradation, yielding to a significant CPU cost reduction.