Fusing Content and Context with Causality

This chapter\(^\dagger\) presents a generative framework that uses influence diagrams to fuse metadata of multiple modalities for photo annotation. We fuse contextual information (location, time, and camera parameters), visual content (holistic and local perceptual features), and semantic ontology in a synergistic way. We use causal strengths to encode causalities between variables, and between variables and semantic labels. Through analytical and empirical studies, we demonstrate that our fusion approach can achieve high-quality photo annotation and good interpretability, substantially better than traditional methods.

[1]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[2]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[3]  Edward Y. Chang,et al.  Multimodal metadata fusion using causal strength , 2005, ACM Multimedia.

[4]  Edward Y. Chang,et al.  EXTENT: fusing context, content, and semantic ontology for photo annotation , 2005, CVDB '05.

[5]  James Ze Wang,et al.  SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Markus A. Stricker,et al.  Similarity of color images , 1995, Electronic Imaging.

[7]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[8]  David Heckerman,et al.  Knowledge Representation and Inference in Similarity Networks and Bayesian Multinets , 1996, Artif. Intell..

[9]  Anind K. Dey,et al.  Understanding and Using Context , 2001, Personal and Ubiquitous Computing.

[10]  David Heckerman,et al.  A Bayesian Approach to Learning Causal Networks , 1995, UAI.

[11]  Yi Wu,et al.  Ontology-based multi-classification learning for video concept detection , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[12]  Shih-Fu Chang,et al.  Tools and techniques for color image retrieval , 1996, Electronic Imaging.

[13]  P. Cheng,et al.  Assessing interactive causal influence. , 2004, Psychological review.

[14]  Shih-Fu Chang,et al.  Visually Searching the Web for Content , 1997, IEEE Multim..

[15]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[16]  Mor Naaman,et al.  From Where to What: Metadata Sharing for Digital Photographs with Geographic Coordinates , 2003, OTM.

[17]  Diomidis Spinellis Position-Annotated Photographs: A Geotemporal Web , 2003, IEEE Pervasive Comput..

[18]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[19]  Judea Pearl,et al.  Causal Inference in the Health Sciences: A Conceptual Introduction , 2001, Health Services and Outcomes Research Methodology.

[20]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[21]  Simon King,et al.  From context to content: leveraging context to infer media metadata , 2004, MULTIMEDIA '04.

[22]  J. Tenenbaum,et al.  Generalization, similarity, and Bayesian inference. , 2001, The Behavioral and brain sciences.

[23]  Mor Naaman,et al.  Context data in geo-referenced digital photo collections , 2004, MULTIMEDIA '04.

[24]  Nikos Karampatziakis,et al.  Probabilistic Outputs for SVMs and Comparisons to Regularized Likelihood Methods , 2007 .

[25]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[26]  Prashant Doshi,et al.  Using Bayesian Networks for Cleansing Trauma Data , 2003, FLAIRS.

[27]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[28]  Dennis McLeod,et al.  Effective Retrieval of Audio Information from Annotated Text Using Ontologies , 2000, MDM/KDD.

[29]  Ramesh C. Jain,et al.  Content without context is meaningless , 2010, ACM Multimedia.

[30]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Jiebo Luo,et al.  Bayesian fusion of camera metadata cues in semantic scene classification , 2004, CVPR 2004.

[32]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[33]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.