Discriminating Image Senses by Clustering with Multimodal Features

We discuss Image Sense Discrimination (ISD), and apply a method based on spectral clustering, using multimodal features from the image and text of the embedding web page. We evaluate our method on a new data set of annotated web images, retrieved with ambiguous query terms. Experiments investigate different levels of sense granularity, as well as the impact of text and image features, and global versus local text features.

[1]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Kobus Barnard,et al.  Word Sense Disambiguation with Pictures , 2003, Artif. Intell..

[3]  Keiji Yanai,et al.  Evaluation strategies for image understanding and retrieval , 2005, MIR '05.

[4]  Tetsuya Ishikawa,et al.  Toward the automatic compilation of multimedia encyclopedias: associating images with term descriptions on the Web , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[5]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[6]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[7]  Michael J. Swain,et al.  WebSeer: An Image Search Engine for the World Wide Web , 1996 .

[8]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[9]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[10]  Patrick Haffner,et al.  Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[11]  Patrick Pantel,et al.  Discovering word senses from text , 2002, KDD.

[12]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[13]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[14]  David A. Forsyth,et al.  Challenges for Annotating Images for Sense Disambiguation , 2006 .

[15]  David A. Forsyth,et al.  Modeling the statistics of image features and associated text , 2001, IS&T/SPIE Electronic Imaging.

[16]  Keiji Yanai,et al.  Probabilistic web image gathering , 2005, MIR '05.

[17]  Naonori Ueda,et al.  A new competitive learning approach based on an equidistortion principle for designing optimal vector quantizers , 1994, Neural Networks.

[18]  Patrick Pantel,et al.  Concept Discovery from Text , 2002, COLING.

[19]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Michael Brady,et al.  Saliency, Scale and Image Description , 2001, International Journal of Computer Vision.

[21]  Wei-Ying Ma,et al.  Multi-model similarity propagation and its application for web image retrieval , 2004, MULTIMEDIA '04.

[22]  Wei-Ying Ma,et al.  Hierarchical clustering of WWW image search results using visual, textual and link information , 2004, MULTIMEDIA '04.

[23]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[24]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.