Image Interpretation Using Large Corpus: Wikipedia

Image is a powerful medium for expressing one's ideas and rightly confirms the adage, “One picture is worth a thousand words.” In this work, we explore the application of world knowledge in the form of Wikipedia to achieve this objective-literally. In the first part, we disambiguate and rank semantic concepts associated with ambiguous keywords by exploiting link structure of articles in Wikipedia. In the second part, we explore an image representation in terms of keywords which reflect the semantic content of an image. Our approach is inspired by the desire to augment low-level image representation with massive amounts of world knowledge, to facilitate computer vision tasks like image retrieval based on this information. We represent an image as a weighted mixture of a predetermined set of concrete concepts whose definition has been agreed upon by a wide variety of audience. To achieve this objective, we use concepts defined by Wikipedia articles, e.g., sky, building, or automobile. An important advantage of our approach is availability of vast amounts of highly organized human knowledge in Wikipedia. Wikipedia evolves rapidly steadily increasing its breadth and depth over time.

[1]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[2]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[3]  Y. Mori,et al.  Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .

[4]  Andrew Zisserman,et al.  Multi-view Matching for Unordered Image Sets, or "How Do I Organize My Holiday Snaps?" , 2002, ECCV.

[5]  Thomas S. Huang,et al.  Which "Apple" are you talking about ? , 2008, WWW.

[6]  Cordelia Schmid,et al.  Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[8]  Jean Ponce,et al.  Computer Vision: A Modern Approach , 2002 .

[9]  Andrew Zisserman,et al.  Wide baseline stereo matching , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[10]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[11]  James J. Little,et al.  Mobile Robot Localization and Mapping with Uncertainty using Scale-Invariant Visual Landmarks , 2002, Int. J. Robotics Res..

[12]  Andrew Zisserman,et al.  Video data mining using configurations of viewpoint invariant regions , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[13]  Wei-Ying Ma,et al.  Learning to cluster web search results , 2004, SIGIR '04.

[14]  Andrew Zisserman,et al.  An Affine Invariant Salient Region Detector , 2004, ECCV.

[15]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[16]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[17]  Matthew A. Brown,et al.  Recognising panoramas , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[18]  Rada Mihalcea,et al.  Using Wikipedia for Automatic Word Sense Disambiguation , 2007, NAACL.

[19]  Cordelia Schmid,et al.  Affine-invariant local descriptors and neighborhood statistics for texture recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[20]  Tom Minka,et al.  Vision texture for annotation , 1995, Multimedia Systems.

[21]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[22]  Cordelia Schmid,et al.  A sparse texture representation using local affine regions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[24]  Gang Wang,et al.  Object image retrieval by exploiting online knowledge resources , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Pietro Perona,et al.  A Probabilistic Approach to Object Recognition Using Local Photometry and Global Geometry , 1998, ECCV.

[26]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[27]  Richard S. Zemel,et al.  Latent topic random fields: Learning using a taxonomy of labels , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Andrew Zisserman,et al.  Automated location matching in movies , 2003, Comput. Vis. Image Underst..

[29]  Manuel Blum,et al.  Peekaboom: a game for locating objects in images , 2006, CHI.

[30]  David A. Forsyth,et al.  Words and Pictures in the News , 2003, HLT-NAACL 2003.

[31]  Cordelia Schmid,et al.  3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints , 2006, International Journal of Computer Vision.

[32]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[33]  Mor Naaman,et al.  Generating diverse and representative image search results for landmarks , 2008, WWW.

[34]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[35]  Meng Wang,et al.  Visual query suggestion , 2009, ACM Multimedia.

[36]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[37]  D. West Introduction to Graph Theory , 1995 .

[38]  Chih-Hung Lin,et al.  Using Semantic Graphs for Image Search , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[39]  Alvy Ray Smith,et al.  Color gamut transform pairs , 1978, SIGGRAPH.

[40]  Cordelia Schmid,et al.  Semantic Hierarchies for Visual Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  David G. Stork,et al.  Pattern Classification , 1973 .

[42]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[43]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Bernardo A. Huberman,et al.  Assessing the value of cooperation in Wikipedia , 2007, First Monday.

[45]  Adam Baumberg,et al.  Reliable feature matching across widely separated views , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[46]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[47]  Luc Van Gool,et al.  Content-Based Image Retrieval Based on Local Affinely Invariant Regions , 1999, VISUAL.

[48]  Luc Van Gool,et al.  Efficient grouping under perspective skew , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[49]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.