论文信息 - Unsupervised Learning of Visual Sense Models for Polysemous Words

Unsupervised Learning of Visual Sense Models for Polysemous Words

Polysemy is a problem for methods that exploit image search engines to build object category models. Existing unsupervised approaches do not take word sense into consideration. We propose a new method that uses a dictionary to learn models of visual word sense from a large collection of unlabeled web data. The use of LDA to discover a latent sense space makes the model robust despite the very limited nature of dictionary definitions. The definitions are used to learn a distribution in the latent space that best represents a sense. The algorithm then uses the text surrounding image links to retrieve images with high probability of a particular dictionary sense. An object classifier is trained on the resulting sense-specific images. We evaluate our method on a dataset obtained by searching the web for polysemous words. Category classification experiments show that our dictionary-based approach outperforms baseline methods.

Trevor Darrell | Kate Saenko | Trevor Darrell | Kate Saenko

[1] David Yarowsky,et al. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[2] M. F. Porter,et al. An algorithm for suffix stripping , 1997 .

[3] Mingjing Li,et al. Web mining for Web image retrieval , 2001, J. Assoc. Inf. Sci. Technol..

[4] Michael I. Jordan,et al. Modeling annotated data , 2003, SIGIR.

[5] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6] Cordelia Schmid,et al. Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[7] Mark Steyvers,et al. Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[8] Pietro Perona,et al. Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[9] David A. Forsyth,et al. Animals on the Web , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10] Keiji Yanai,et al. Cross Modal Disambiguation , 2006, Toward Category-Level Object Recognition.

[11] Andrew McCallum,et al. People-LDA: Anchoring Topics to People using Face Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12] Antonio Criminisi,et al. Harvesting Image Databases from the Web , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[13] Fei-Fei Li,et al. OPTIMOL: Automatic Online Picture Collection via Incremental Model Learning , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.