Object Boundary Detection in Images using a Semantic Ontology

We present a novel method for detecting the boundaries between objects in images that uses a large, hierarchical, semantic ontology - WordNet. The semantic object hierarchy in WordNet grounds this ill-posed segmentation problem, so that true boundaries are defined as edges between instances of different classes, and all other edges are clutter. To avoid fully classifying each pixel, which is very difficult in generic images, we evaluate the semantic similarity of the two regions bounding each edge in an initial oversegmentation. Semantic similarity is computed using WordNet enhanced with appearance information, and is largely orthogonal to visual similarity. Hence two regions with very similar visual attributes, but from different categories, can have a large semantic distance and therefore evidence of a strong boundary between them, and vice versa. The ontology is trained with images from the UC Berkeley image segmentation benchmark, extended with manual labeling of the semantic content of each image segment. Results on boundary detection against the benchmark images show that semantic similarity computed through WordNet can significantly improve boundary detection compared to generic segmentation.

[1]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[2]  Anthony Hoogs,et al.  Video content annotation using visual analysis and a large semantic knowledgebase , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[3]  Bernt Schiele,et al.  Integrating representative and discriminant models for object category detection , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[4]  Martial Hebert,et al.  A hierarchical field framework for unified context-based classification , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[5]  Jitendra Malik,et al.  Contour and Texture Analysis for Image Segmentation , 2001, International Journal of Computer Vision.

[6]  Jitendra Malik,et al.  Recognizing surfaces using three-dimensional textons , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[7]  Stella X. Yu,et al.  Segmentation induced by scale invariance , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Christopher K. I. Williams,et al.  Combining Belief Networks and Neural Networks for Scene Segmentation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[11]  Jitendra Malik,et al.  Learning to Detect Natural Image Boundaries Using Brightness and Texture , 2002, NIPS.

[12]  Yixin Chen,et al.  A Region-Based Fuzzy Feature Matching Approach to Content-Based Image Retrieval , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[15]  Ramanathan V. Guha,et al.  Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project , 1990 .

[16]  Alan L. Yuille,et al.  Statistical cues for domain specific image segmentation with performance analysis , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[17]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..