Dictionary-free categorization of very similar objects via stacked evidence trees

Current work in object categorization discriminates among objects that typically possess gross differences which are readily apparent. However, many applications require making much finer distinctions. We address an insect categorization problem that is so challenging that even trained human experts cannot readily categorize images of insects considered in this paper. The state of the art that uses visual dictionaries, when applied to this problem, yields mediocre results (16.1% error). Three possible explanations for this are (a) the dictionaries are unsupervised, (b) the dictionaries lose the detailed information contained in each keypoint, and (c) these methods rely on hand-engineered decisions about dictionary size. This paper presents a novel, dictionary-free methodology. A random forest of trees is first trained to predict the class of an image based on individual keypoint descriptors. A unique aspect of these trees is that they do not make decisions but instead merely record evidence-i.e., the number of descriptors from training examples of each category that reached each leaf of the tree. We provide a mathematical model showing that voting evidence is better than voting decisions. To categorize a new image, descriptors for all detected keypoints are “dropped” through the trees, and the evidence at each leaf is summed to obtain an overall evidence vector. This is then sent to a second-level classifier to make the categorization decision. We achieve excellent performance (6.4% error) on the 9-class STONEFLY9 data set. Also, our method achieves an average AUC of 0.921 on the PASCAL06 VOC, which places it fifth out of 21 methods reported in the literature and demonstrates that the method also works well for generic object categorization.

[1]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[2]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[3]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.

[4]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[5]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[6]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[7]  Stan Z. Li,et al.  Learning spatially localized, parts-based representation , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[8]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  Michael Brady,et al.  Saliency, Scale and Image Description , 2001, International Journal of Computer Vision.

[11]  Yunde Jia,et al.  FISHER NON-NEGATIVE MATRIX FACTORIZATION FOR LEARNING LOCAL FEATURES , 2004 .

[12]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[13]  Jieping Ye,et al.  Two-Dimensional Linear Discriminant Analysis , 2004, NIPS.

[14]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[15]  Hua Li,et al.  IMMC: incremental maximum margin criterion , 2004, KDD.

[16]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17]  Hai Tao,et al.  Non-orthogonal binary subspace and its applications in computer vision , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[18]  S. Lazebnik,et al.  Local Features and Kernels for Classification of Texture and Object Categories: An In-Depth Study , 2005 .

[19]  Tamir Hazan,et al.  Non-negative tensor factorization with applications to statistics and computer vision , 2005, ICML.

[20]  Tamir Hazan,et al.  Sparse image coding using a 3D non-negative tensor factorization , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[21]  Frédéric Jurie,et al.  Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[22]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[23]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[24]  Frédéric Jurie,et al.  Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[25]  Thomas G. Dietterich,et al.  Principal Curvature-Based Region Detector for Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Thomas G. Dietterich,et al.  Automated insect identification through concatenated histograms of local appearance features: feature vector generation and region detection for deformable objects , 2007, 2007 IEEE Workshop on Applications of Computer Vision (WACV '07).

[27]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[28]  Thomas G. Dietterich,et al.  Learning visual dictionaries and decision lists for object recognition , 2008, 2008 19th International Conference on Pattern Recognition.

[29]  Rong Jin,et al.  Unifying discriminative visual codebook generation with classifier training for object category recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.