Computational Strategies for Model-Based Scene Interpretation

Scene interpretation, in the sense of detecting and localizing instances from multiple object classes, is formulated as a two-step process in which non-contextual detection primes global interpretation. During detection a list of instantiations (object identities and poses) is compiled constrained only by invariance no missed detections at the expense of false positives. Contextual information, such as expected relationships among poses, is incorporated afterwards to remove ambiguities. This division is motivated by computational efficiency. In addition, detection itself is organized as a coarse-to-fine search simultaneously in class and pose. This search can be interpreted as successive approximations to likelihood ratio tests arising from a simple (“naive Bayes”) statistical model for the edge maps extracted from the original images. The key to constructing efficient “hypothesis tests” for multiple classes and poses is local ORing; in particular, spread edges provide imprecise but common and locally invariant features. Natural tradeoffs then emerge between discrimination and the shape and extent of spreading. These are analyzed mathematically within the model-based framework and the whole procedure is illustrated by experiments in reading license plates.

[1]  Robert C. Bolles,et al.  Parametric Correspondence and Chamfer Matching: Two New Techniques for Image Matching , 1977, IJCAI.

[2]  Kunihiko Fukushima,et al.  Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position , 1982, Pattern Recognit..

[3]  Yehezkel Lamdan,et al.  Object recognition by affine invariant matching , 2011, Proceedings CVPR '88: The Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  William Grimson,et al.  Object recognition by computer - the role of geometric constraints , 1991 .

[5]  K Fukushima,et al.  Handwritten alphanumeric character recognition by the neocognitron , 1991, IEEE Trans. Neural Networks.

[6]  S. Grossberg,et al.  Neural networks for vision and image processing , 1992 .

[7]  Takeo Watanabe,et al.  Neural networks for vision and image processing , 1993 .

[8]  S. Sutherland Eye, brain and vision , 1993, Nature.

[9]  William Rucklidge,et al.  Locating objects using the Hausdorff distance , 1995, Proceedings of IEEE International Conference on Computer Vision.

[10]  S Ullman,et al.  Sequence seeking and counter streams: a computational model for bidirectional information flow in the visual cortex. , 1995, Cerebral cortex.

[11]  Norbert Krüger,et al.  Face recognition by elastic bunch graph matching , 1997, Proceedings of International Conference on Image Processing.

[12]  Clark F. Olson,et al.  Automatic target recognition by matching oriented edge pixels , 1997, IEEE Trans. Image Process..

[13]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[14]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Dariu Gavrila,et al.  Multi-feature hierarchical template matching using distance transforms , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[16]  Takeo Kanade,et al.  Probabilistic modeling of local appearance and spatial relationships for object recognition , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[17]  Yali Amit,et al.  A Computational Model for Visual Selection , 1999, Neural Computation.

[18]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[19]  Claudio M. Privitera,et al.  Algorithms for Defining Visual Regions-of-Interest: Comparison with Eye Fixations , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Yali Amit,et al.  A Neural Network Architecture for Visual Selection , 2000, Neural Computation.

[21]  George Nagy,et al.  Twenty Years of Document Image Analysis in PAMI , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Yali Amit,et al.  2D Object Detection and Recognition , 2002 .

[23]  David J. Marchette,et al.  Fast Face Detection with a Boosted CCCD Classifier , 2003 .

[24]  Y. Amit,et al.  An integrated network for invariant visual detection and recognition , 2003, Vision Research.

[25]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[26]  Donald Geman,et al.  Coarse-to-Fine Face Detection , 2004, International Journal of Computer Vision.

[27]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[28]  D. Geman,et al.  Hierarchical testing designs for pattern recognition , 2005, math/0507421.

[29]  Yehezkel Yeshurun,et al.  Context-free attentional operators: The generalized symmetry transform , 1995, International Journal of Computer Vision.

[30]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Tony Lindeberg,et al.  Detecting salient blob-like image structures and their scales with a scale-space primal sketch: A method for focus-of-attention , 1993, International Journal of Computer Vision.