Constructing models for content-based image retrieval

This paper presents a new method for constructing models from a set of positive and negative sample images; the method requires no manual extraction of significant objects or features. Our model representation is based on two layers. The first one consists of "generic" descriptors which represent sets of similar rotational invariant feature vectors. Rotation invariance allows to group similar, but rotated patterns and makes the method robust to model deformations. The second layer is the joint probability on the frequencies of the "generic" descriptors over neighborhoods. This probability is multi-modal and is represented by a set of "spatial-frequency" clusters. It adds a statistical spatial constraint which is rotationally invariant. Our two-layer representation is novel; it allows to efficiently capture "texture-like" visual structure. The selection of distinctive structure determines characteristic model features (common to the positive and rare in the negative examples) and increases the performance of the model. Models are retrieved and localized using a probabilistic score. Experimental results for "textured" animals and faces show a very good performance for retrieval as well as localization.

[1]  Dennis Gabor,et al.  Theory of communication , 1946 .

[2]  Anil K. Jain,et al.  Unsupervised texture segmentation using Gabor filters , 1990, 1990 IEEE International Conference on Systems, Man, and Cybernetics Conference Proceedings.

[3]  Anil K. Jain,et al.  Unsupervised texture segmentation using Gabor filters , 1990, 1990 IEEE International Conference on Systems, Man, and Cybernetics Conference Proceedings.

[4]  Christos Faloutsos,et al.  QBIC project: querying images by content, using color, texture, and shape , 1993, Electronic Imaging.

[5]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[6]  David A. Forsyth,et al.  Body plans , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Tomaso A. Poggio,et al.  Example-Based Learning for View-Based Human Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Jitendra Malik,et al.  Color- and texture-based image segmentation using EM and its application to content-based image retrieval , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[9]  W. Eric L. Grimson,et al.  A framework for learning query concepts in image classification , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[10]  Paul A. Viola,et al.  A cluster-based statistical model for object detection , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[11]  Carlo Tomasi,et al.  Texture-based image retrieval without segmentation , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[12]  Jitendra Malik,et al.  Textons, contours and regions: cue integration in image segmentation , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[13]  Yali Amit,et al.  A Computational Model for Visual Selection , 1999, Neural Computation.

[14]  Rachid Deriche,et al.  Geodesic active regions for supervised texture segmentation , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[15]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[16]  Takeo Kanade,et al.  A statistical method for 3D object detection applied to faces and cars , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[17]  Alan L. Yuille,et al.  Statistical cues for domain specific image segmentation with performance analysis , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).