Making Visual Object Categorization More Challenging: Randomized Caltech-101 Data Set

Visual object categorization is one of the most active research topics in computer vision, and Caltech-101 data set is one of the standard benchmarks for evaluating the method performance. Despite of its wide use, the data set has certain weaknesses: i) the objects are practically in a standard pose and scale in the middle of the images and ii) background varies too little in certain categories making it more discriminative than the foreground objects. In this work, we demonstrate how these weaknesses bias the evaluation results in an undesired manner. In addition, we reduce the bias effect by replacing the backgrounds with random landscape images from Google and by applying random Euclidean transformations to the foreground objects. We demonstrate how the proposed randomization process makes visual object categorization more challenging improving the relative results of methods which categorize objects by their visual appearance and are invariant to pose changes. The new data set is made publicly available for other researchers.

[1]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[2]  Lixin Fan,et al.  Categorizing Nine Visual Classes using Local Appearance Descriptors , 2004 .

[3]  Silvio Savarese,et al.  Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[6]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[7]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[8]  Cor J. Veenman,et al.  Kernel Codebooks for Scene Categorization , 2008, ECCV.

[9]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[10]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[11]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[12]  Cordelia Schmid,et al.  Dataset Issues in Object Recognition , 2006, Toward Category-Level Object Recognition.

[13]  Cordelia Schmid,et al.  Constructing Category Hierarchies for Visual Recognition , 2008, ECCV.

[14]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[15]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[16]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[17]  Joni-Kristian Kämäräinen,et al.  Bag-of-Features Codebook Generation by Self-Organisation , 2009, WSOM.