Categorizing Nine Visual Classes using Local Appearance Descriptors

We present a novel method for generic visual categorization: the problem of identifying the object content of natural images while generalizing across variations inherent to the object class. This bag of keypoints method is based on vector quantization of affine invariant descriptors of image patches. We propose and compare two alternative implementations using different classifiers: Naive Bayes and SVM. The main advantages of the method are that it is simple, computationally efficient and intrinsically invariant. We present results for classifying nine semantic visual categories and comment on results obtained by Fergus et al using a different method on the same data set. We obtain excellent results as well for multi class categorization as for object detection. A thorough evaluation clearly demonstrates that our method is robust to background clutter and produces good categorization accuracy even without exploiting geometric information.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Tony Lindeberg,et al.  Scale-Space Theory in Computer Vision , 1993, Lecture Notes in Computer Science.

[4]  Theodoros Evgeniou,et al.  A TRAINABLE PEDESTRIAN DETECTION SYSTEM , 1998 .

[5]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[6]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[7]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[8]  Jitendra Malik,et al.  Recognizing surfaces using three-dimensional textons , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[9]  Carlo Tomasi,et al.  Texture-based image retrieval without segmentation , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[10]  Thorsten Joachims,et al.  Text categorization with support vector machines , 1999 .

[11]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[12]  Daphne Koller,et al.  Support Vector Machine Active Learning with Application sto Text Classification , 2000, ICML.

[13]  Takeo Kanade,et al.  A statistical method for 3D object detection applied to faces and cars , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[14]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[15]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[16]  Jiri Matas,et al.  Object Recognition using the Invariant Pixel-Set Signature , 2000, BMVC.

[17]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[18]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[19]  Andrew Zisserman,et al.  Viewpoint invariant texture matching and wide baseline stereo , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[20]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[21]  Harry Shum,et al.  Statistical Learning of Multi-view Face Detection , 2002, ECCV.

[22]  Andrew Zisserman,et al.  Classifying materials from images: to cluster or not to cluster? , 2002, European Conference on Computer Vision.

[23]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[24]  Cordelia Schmid,et al.  Learning to Parse Pictures of People , 2002, ECCV.

[25]  Lei Zhu,et al.  Theory of keyblock-based image retrieval , 2002, TOIS.

[26]  Sparse Texture Representation Using Affine-Invariant Neighborhoods CVPR Paper , 2003 .

[27]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[28]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[29]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[30]  Nello Cristianini,et al.  Latent Semantic Kernels , 2001, Journal of Intelligent Information Systems.

[31]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.