Approximation of Linear Discriminant Analysis for Word Dependent Visual Features Selection

To automatically determine a set of keywords that describes the content of a given image is a difficult problem, because of (i) the huge dimension number of the visual space and (ii) the unsolved object segmentation problem. Therefore, in order to solve matter (i), we present a novel method based on an Approximation of Linear Discriminant Analysis (ALDA) from the theoretical and practical point of view. Application of ALDA is more generic than usual LDA because it doesn’t require explicit class labelling of each training sample, and however allows efficient estimation of the visual features discrimination power. This is particularly interesting because of (ii) and the expensive manually object segmentation and labelling tasks on large visual database. In first step of ALDA, for each word wk, the train set is split in two, according if images are labelled or not by wk. Then, under weak assumptions, we show theoretically that Between and Within variances of these two sets are giving good estimates of the best discriminative features for wk. Experimentations are conducted on COREL database, showing an efficient word adaptive feature selection, and a great enhancement (+37%) of an image Hierarchical Ascendant Classification (HAC) for which ALDA saves also computational cost reducing by 90% the visual features space.

[1]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[2]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[3]  Hervé Glotin,et al.  Large-vocabulary audio-visual speech recognition: a summary of the Johns Hopkins Summer 2000 Workshop , 2001, 2001 IEEE Fourth Workshop on Multimedia Signal Processing (Cat. No.01TH8564).

[4]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Juergen Luettin,et al.  Hierarchical discriminant features for audio-visual LVCSR , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[6]  Qingshan Liu,et al.  Face recognition using kernel based fisher discriminant analysis , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[7]  Hervé Glotin,et al.  Enhancement of Textual Images Classification Using Segmented Visual Contents for Image Search Engine , 2005, Multimedia Tools and Applications.

[8]  Matthieu Cord,et al.  A comparison of active classification methods for content-based image retrieval , 2004, CVDB '04.

[9]  Sabrina Tollari,et al.  Keyword dependant selection of visual features and their heterogeneity for image content-based interpretation , 2007 .

[10]  Daniel Gatica-Perez,et al.  On Automatic Annotation of Images with Latent Space Models , 2003 .

[11]  David A. Forsyth,et al.  The effects of segmentation and feature choice in a translation model of object recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[12]  Thierry Pun,et al.  The Truth about Corel - Evaluation in Image Retrieval , 2002, CIVR.

[13]  David G. Stork,et al.  Pattern Classification , 1973 .

[14]  Wei-Ying Ma,et al.  Image and Video Retrieval , 2003, Lecture Notes in Computer Science.

[15]  James Ze Wang,et al.  SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries , 2000, IEEE Trans. Pattern Anal. Mach. Intell..