Discovering Efficiency in Coarse-To-Fine Texture Classification

We introduce a model for joint texture classification and segmentation that learns not only how to classify accurately, but when to classify efficiently. This model, combined with a complementary efficient feature representation that we describe, allows us to move beyond naive slidingwindow classification strategies into sub-linear coarse-tofine classification of an entire image. Recognition is formulated as a scale-space traversal through the image in which we can “stop short” at coarse scales, dramatically increasing both the speed and the accuracy of classification. Unlike other models, ours is constructed such that the classification produced when stopping-short is exact (that is, equivalent to the classification produced when not stopping-short), because coarse-to-fine efficiency is directly incorporated into the model. Classification is demonstrated on partiallyand fully-annotated datasets of satellite and medical imagery.

[1]  Edward H. Adelson,et al.  The Laplacian Pyramid as a Compact Image Code , 1983, IEEE Trans. Commun..

[2]  Hanan Samet,et al.  The Quadtree and Related Hierarchical Data Structures , 1984, CSUR.

[3]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[4]  P. Anandan,et al.  Hierarchical Model-Based Motion Estimation , 1992, ECCV.

[5]  Charles A. Bouman,et al.  A multiscale random field model for Bayesian image segmentation , 1994, IEEE Trans. Image Process..

[6]  Yann LeCun,et al.  Boxlets: A Fast Convolution Algorithm for Signal Processing and Neural Networks , 1998, NIPS.

[7]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[8]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[9]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[10]  Christopher K. I. Williams,et al.  Dynamic Positional Trees for Structural Image Analysis , 2001, AISTATS.

[11]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[12]  A. Willsky Multiresolution Markov models for signal and image processing , 2002, Proc. IEEE.

[13]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[14]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[15]  Donald Geman,et al.  Coarse-to-Fine Face Detection , 2004, International Journal of Computer Vision.

[16]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[17]  Martial Hebert,et al.  A hierarchical field framework for unified context-based classification , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[18]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Fatih Murat Porikli,et al.  Integral histogram: a fast way to extract histograms in Cartesian spaces , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20]  Michael I. Jordan,et al.  Learning Multiscale Representations of Natural Scenes Using Dirichlet Processes , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[21]  Daphne Koller,et al.  Learning Spatial Context: Using Stuff to Find Things , 2008, ECCV.

[22]  Christoph H. Lampert,et al.  Beyond sliding windows: Object localization by efficient subwindow search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.