EXTENT: fusing context, content, and semantic ontology for photo annotation

This architecture paper presents EXTENT, a probabilistic framework that uses influence diagrams to fuse metadata of multiple modalities for photo annotation. EXTENT fuses contextual information (location, time, and camera parameters), photo content (perceptual features), and semantic ontology in a synergistic way. It uses causal strengths to encode causalities between variables, and between variables and semantic labels. Through a landmark-recognition case study, we show that EXTENT can provide high-quality annotation, substantially better than any traditional unimodal methods.

[1]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[2]  Mor Naaman,et al.  Context data in geo-referenced digital photo collections , 2004, MULTIMEDIA '04.

[3]  Mor Naaman,et al.  From Where to What: Metadata Sharing for Digital Photographs with Geographic Coordinates , 2003, OTM.

[4]  Jiebo Luo,et al.  Beyond pixels: Exploiting camera metadata for photo classification , 2005, Pattern Recognit..

[5]  P. Cheng,et al.  Assessing interactive causal influence. , 2004, Psychological review.

[6]  Anind K. Dey,et al.  Understanding and Using Context , 2001, Personal and Ubiquitous Computing.

[7]  David Heckerman,et al.  A Bayesian Approach to Learning Causal Networks , 1995, UAI.

[8]  Edward Y. Chang,et al.  SVM binary classifier ensembles for image classification , 2001, CIKM '01.

[9]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[10]  J. Tenenbaum,et al.  Generalization, similarity, and Bayesian inference. , 2001, The Behavioral and brain sciences.

[11]  Edward Y. Chang,et al.  CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines , 2003, IEEE Trans. Circuits Syst. Video Technol..

[12]  Ross D. Shachter,et al.  Decision-Theoretic Foundations for Causal Reasoning , 1995, J. Artif. Intell. Res..

[13]  Diomidis Spinellis Position-Annotated Photographs: A Geotemporal Web , 2003, IEEE Pervasive Comput..

[14]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[15]  Simon King,et al.  From context to content: leveraging context to infer media metadata , 2004, MULTIMEDIA '04.

[16]  Edward Y. Chang,et al.  Using one-class and two-class SVMs for multiclass image annotation , 2005, IEEE Transactions on Knowledge and Data Engineering.

[17]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.