Dictionary-free categorization of very similar objects via stacked evidence trees

Current work in object categorization discriminates among objects that typically possess gross differences which are readily apparent. However, many applications require making much finer distinctions. We address an insect categorization problem that is so challenging that even trained human experts cannot readily categorize images of insects considered in this paper. The state of the art that uses visual dictionaries, when applied to this problem, yields mediocre results (16.1% error). Three possible explanations for this are (a) the dictionaries are unsupervised, (b) the dictionaries lose the detailed information contained in each keypoint, and (c) these methods rely on hand-engineered decisions about dictionary size. This paper presents a novel, dictionary-free methodology. A random forest of trees is first trained to predict the class of an image based on individual keypoint descriptors. A unique aspect of these trees is that they do not make decisions but instead merely record evidence-i.e., the number of descriptors from training examples of each category that reached each leaf of the tree. We provide a mathematical model showing that voting evidence is better than voting decisions. To categorize a new image, descriptors for all detected keypoints are “dropped” through the trees, and the evidence at each leaf is summed to obtain an overall evidence vector. This is then sent to a second-level classifier to make the categorization decision. We achieve excellent performance (6.4% error) on the 9-class STONEFLY9 data set. Also, our method achieves an average AUC of 0.921 on the PASCAL06 VOC, which places it fifth out of 21 methods reported in the literature and demonstrates that the method also works well for generic object categorization.

[1]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[2]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[3]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[4]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[5]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[6]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[7]  S. Lazebnik,et al.  Local Features and Kernels for Classification of Texture and Object Categories: An In-Depth Study , 2005 .

[8]  Frédéric Jurie,et al.  Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[9]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[10]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Frédéric Jurie,et al.  Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[12]  Thomas G. Dietterich,et al.  Principal Curvature-Based Region Detector for Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Thomas G. Dietterich,et al.  Automated Insect Identification through Concatenated Histograms of Local Appearance Features , 2007, WACV.

[14]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[15]  Thomas G. Dietterich,et al.  Learning visual dictionaries and decision lists for object recognition , 2008, 2008 19th International Conference on Pattern Recognition.

[16]  Rong Jin,et al.  Unifying discriminative visual codebook generation with classifier training for object category recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .