Weakly Supervised Learning of Visual Models and Its Application to Content-Based Retrieval

This paper presents a method for weakly supervised learning of visual models. The visual model is based on a two-layer image description: a set of “generic” descriptors and their distribution over neighbourhoods. “Generic” descriptors represent sets of similar rotational invariant feature vectors. Statistical spatial constraints describe the neighborhood structure and make our description more discriminant. The joint probability of the frequencies of “generic” descriptors over a neighbourhood is multi-modal and is represented by a set of “neighbourhood-frequency” clusters. Our image description is rotationally invariant, robust to model deformations and characterizes efficiently “appearance-based” visual structure. The selection of distinctive clusters determines model features (common to the positive and rare in the negative examples). Visual models are retrieved and localized using a probabilistic score. Experimental results for “textured” animals and faces show a very good performance for retrieval as well as localization.

[1]  Dennis Gabor,et al.  Theory of communication , 1946 .

[2]  Tomaso A. Poggio,et al.  A Trainable System for Object Detection , 2000, International Journal of Computer Vision.

[3]  Carlo Tomasi,et al.  Texture-based image retrieval without segmentation , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4]  Jitendra Malik,et al.  Textons, contours and regions: cue integration in image segmentation , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[5]  Cordelia Schmid,et al.  Constructing models for content-based image retrieval , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[6]  Andrew Zisserman,et al.  Classifying Images of Materials: Achieving Viewpoint and Illumination Independence , 2002, ECCV.

[7]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[8]  W. Eric L. Grimson,et al.  A framework for learning query concepts in image classification , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[9]  C. Schmid,et al.  Indexing based on scale invariant interest points , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[10]  Jitendra Malik,et al.  Color- and texture-based image segmentation using EM and its application to content-based image retrieval , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[12]  Paul A. Viola,et al.  A cluster-based statistical model for object detection , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[13]  Christos Faloutsos,et al.  QBIC project: querying images by content, using color, texture, and shape , 1993, Electronic Imaging.

[14]  Rachid Deriche,et al.  Geodesic active regions for supervised texture segmentation , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[15]  Robert P. W. Duin,et al.  On Combining One-Class Classifiers for Image Database Retrieval , 2002, Multiple Classifier Systems.

[16]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[17]  Florentin Wörgötter,et al.  Machine Vision and Applications Manuscript Nr. Performance of Phase-based Algorithms for Disparity Estimation , 2022 .

[18]  Anil K. Jain,et al.  Unsupervised texture segmentation using Gabor filters , 1990, 1990 IEEE International Conference on Systems, Man, and Cybernetics Conference Proceedings.

[19]  Tomaso A. Poggio,et al.  Example-Based Learning for View-Based Human Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Cordelia Schmid,et al.  Face detection based on generic local descriptors and spatial constraints , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[21]  J. Koenderink,et al.  Representation of local geometry in the visual system , 1987, Biological Cybernetics.

[22]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[23]  Yali Amit,et al.  A Computational Model for Visual Selection , 1999, Neural Computation.

[24]  Alan L. Yuille,et al.  Statistical cues for domain specific image segmentation with performance analysis , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[25]  David A. Forsyth,et al.  Body plans , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Tony Lindeberg,et al.  Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[27]  PoggioTomaso,et al.  Example-Based Learning for View-Based Human Face Detection , 1998 .

[28]  Takeo Kanade,et al.  A statistical method for 3D object detection applied to faces and cars , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[29]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.