Image understanding and the web: a state-of-the-art review

The contextual information of Web images is investigated to address the issue of characterizing their content with semantic descriptors and therefore bridge the semantic gap, i.e. the gap between their automated low-level representation in terms of colors, textures, shapes…and their semantic interpretation. Such characterization allows for understanding the image content and is crucial in important Web-based tasks such as image indexing and retrieval. Although we are highly motivated by the availability of rich knowledge on the Web and the relative success achieved by commercial search engines in automatically characterizing the image content using contextual information in Web pages, we are aware that the unpredictable quality of the contextual information is a major limiting factor. Among the reasons explaining the difficulty to leverage on the image contextual information, some problems are related to the characterization and extraction of this information. Indeed, the first issue is the lack of large-scale studies to highlight what is considered the relevant contextual information of an image, where it is located in a Web page and whether it is consistent across Web pages of different types, content layouts and domains. Also, the matter related to the extraction of this contextual information is topical as state-of-the-art automated extraction tools are unable to handle the heterogeneous Web. As far as the processing of the contextual information is concerned, problems linked to the syntactic and semantic characterizations of the textual components are important to address in order to tackle the semantic gap. Furthermore, questions pertaining to the organization of these textual components into coherent structures that are usable in image indexing and retrieval frameworks shall arise. To address these issues, we lay down the anatomy of a generic context-based Web image understanding framework and propose its stage-based decomposition, covering topical issues from information indexing and retrieval, image description models, natural language processing, webpage segmentation and automated information extraction. For each of the identified stages, we review state-of-the-art solutions in the literature categorized and analyzed under the light of the techniques used.

[1]  Abebe Rorissa,et al.  A comparative study of Flickr tags and index terms in a general image collection , 2010, J. Assoc. Inf. Sci. Technol..

[2]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[3]  Rada Mihalcea,et al.  Explorations in Automatic Image Annotation using Textual Features , 2009, Linguistic Annotation Workshop.

[4]  Thijs Westerveld,et al.  Image Retrieval: Content versus Context , 2000, RIAO.

[5]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[6]  Mor Naaman,et al.  Generating diverse and representative image search results for landmarks , 2008, WWW.

[7]  Masashi Inoue On the need for annotation-based image retrieval , 2004 .

[8]  Wei-Ying Ma,et al.  An adaptive graph model for automatic image annotation , 2006, MIR '06.

[9]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[10]  Kevin Li,et al.  Faceted metadata for image search and browsing , 2003, CHI '03.

[11]  Latifur Khan,et al.  Image annotations by combining multiple evidence & wordNet , 2005, ACM Multimedia.

[12]  Berthier A. Ribeiro-Neto,et al.  Image retrieval using multiple evidence ranking , 2004, IEEE Transactions on Knowledge and Data Engineering.

[13]  Wei-Ying Ma,et al.  Clustering and searching WWW images using link and page layout analysis , 2007, TOMCCAP.

[14]  Erwin Panofsky,et al.  Studies in Iconology , 1962 .

[15]  Shih-Fu Chang,et al.  Image and video search engine for the World Wide Web , 1997, Electronic Imaging.

[16]  Martha Larson,et al.  Reading between the tags to predict real-world size-class for visually depicted objects in images , 2011, MM '11.

[17]  Keiichiro Hoashi,et al.  Robust web page segmentation for mobile terminal using content-distances and page layout information , 2007, WWW '07.

[18]  Jer Lang Hong,et al.  Information extraction for search engines using fast heuristic techniques , 2010, Data Knowl. Eng..

[19]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[20]  Jan Alexander,et al.  Teaching critical evaluation skills for World Wide Web resources , 1996 .

[21]  Bing Liu,et al.  Web data extraction based on partial tree alignment , 2005, WWW '05.

[22]  Qiang Yang,et al.  A unified framework for semantics and feature based relevance feedback in image retrieval systems , 2000, ACM Multimedia.

[23]  Yansong Feng,et al.  Automatic Image Annotation Using Auxiliary Text Information , 2008, ACL.

[24]  Ying Liu,et al.  A survey of content-based image retrieval with high-level semantics , 2007, Pattern Recognit..

[25]  Shuicheng Yan,et al.  Inferring semantic concepts from community-contributed images and noisy tags , 2009, ACM Multimedia.

[26]  Eugenie Giesbrecht,et al.  Automatic Identification of Non-Compositional Multi-Word Expressions using Latent Semantic Analysis , 2006 .

[27]  Wei-Ying Ma,et al.  VIPS: a Vision-based Page Segmentation Algorithm , 2003 .

[28]  Wen Gao,et al.  A Broadcast Model for Web Image Annotation , 2006, PCM.

[29]  Wei-Ying Ma,et al.  Hierarchical clustering of WWW image search results using visual, textual and link information , 2004, MULTIMEDIA '04.

[30]  Christoph H. Lampert,et al.  Correlational spectral clustering , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Stan Matwin,et al.  Statistical Phrases in Automated Text Categorization , 2000 .

[32]  Meng Wang,et al.  Tag Tagging: Towards More Descriptive Keywords of Image Content , 2011, IEEE Transactions on Multimedia.

[33]  Deepayan Chakrabarti,et al.  A graph-theoretic approach to webpage segmentation , 2008, WWW.

[34]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[35]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[36]  Jaeyoung Yang,et al.  Repetition-based web page segmentation by detecting tag patterns for small-screen devices , 2010, IEEE Transactions on Consumer Electronics.

[37]  Karen Spärck Jones Index term weighting , 1973, Inf. Storage Retr..

[38]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[39]  Sara Shatford,et al.  Analyzing the Subject of a Picture: A Theoretical Approach , 1986 .

[40]  Masashi Inoue,et al.  Image retrieval: Research and use in the information explosion , 2009 .

[41]  Fariza Fauzi,et al.  A user study to investigate semantically relevant contextual information of WWW images , 2010, Int. J. Hum. Comput. Stud..

[42]  S. Sclaroff,et al.  Combining textual and visual cues for content-based image retrieval on the World Wide Web , 1998, Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173).

[43]  Zhiguo Gong,et al.  Web image indexing by using associated texts , 2005, Knowledge and Information Systems.

[44]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Valter Crescenzi,et al.  Automatic information extraction from large websites , 2004, JACM.

[46]  Gary Marchionini,et al.  Text or Pictures? An Eyetracking Study of How People View Digital Video Surrogates , 2003, CIVR.

[47]  Luc Van Gool,et al.  World-scale mining of objects and events from community photo collections , 2008, CIVR '08.

[48]  Mike Thelwall,et al.  A Study of Information Retrieval Weighting Schemes for Sentiment Analysis , 2010, ACL.

[49]  Wei-Ying Ma,et al.  Multi-model similarity propagation and its application for web image retrieval , 2004, MULTIMEDIA '04.

[50]  Michael Böttner,et al.  Natural Language , 1997, Relational Methods in Computer Science.

[51]  David Dagan Feng,et al.  What is happening: annotating images with verbs , 2012, ACM Multimedia.

[52]  Joel L. Fagan,et al.  Automatic Phrase Indexing for Document Retrieval: An Examination of Syntactic and Non-Syntactic Methods , 1987, SIGIR.

[53]  Abebe Rorissa,et al.  User-generated descriptions of individual images versus labels of groups of images: A comparison using basic level theory , 2008, Inf. Process. Manag..

[54]  Robert P. Futrelle,et al.  Extracting structure from HTML documents for language visualization and analysis , 2003 .

[55]  Marco La Cascia,et al.  Unifying Textual and Visual Cues for Content-Based Image Retrieval on the World Wide Web , 1999, Comput. Vis. Image Underst..

[56]  Alejandro Jaimes,et al.  Human factors in automatic image retrieval system design and evaluation , 2006, Electronic Imaging.

[57]  Rada Mihalcea,et al.  Text Mining for Automatic Image Tagging , 2010, COLING.

[58]  CARLO MEGHINI,et al.  A model of multimedia information retrieval , 2001, JACM.

[59]  Michael Johnston,et al.  Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Demonstration Session , 2009 .

[60]  Edie M. Rasmussen,et al.  Searching for images: The analysis of users' queries for image retrieval in American history , 2003, J. Assoc. Inf. Sci. Technol..

[61]  Arnaud Sahuguet,et al.  Building intelligent Web applications using lightweight wrappers , 2001, Data Knowl. Eng..

[62]  Stefan Conrad,et al.  Measuring performance of web image context extraction , 2010, MDMKDD '10.

[63]  Michael J. Swain,et al.  WebSeer: An Image Search Engine for the World Wide Web , 1996 .

[64]  Changhu Wang,et al.  Learning to reduce the semantic gap in web image retrieval and annotation , 2008, SIGIR '08.

[65]  Jing Liu,et al.  Image annotation via graph learning , 2009, Pattern Recognit..

[66]  Beng Chin Ooi,et al.  Giving meanings to WWW images , 2000, MM 2000.

[67]  EunKyung Chung,et al.  An exploratory analysis on unsuccessful image searches , 2010, ASIST.

[68]  Nathan Schneider,et al.  Association for Computational Linguistics: Human Language Technologies , 2011 .

[69]  Clement T. Yu,et al.  Using semantic contents and WordNet in image retrieval , 1997, SIGIR '97.

[70]  Naresh K. Malhotra,et al.  Marketing Research: An Applied Orientation , 1993 .

[71]  Frederick H. Lochovsky,et al.  Data extraction and label assignment for web databases , 2003, WWW '03.

[72]  Yansong Feng,et al.  Topic Models for Image Annotation and Text Illustration , 2010, HLT-NAACL.

[73]  Vance W. Berger,et al.  Binomial Distribution: Estimating and Testing Parameters , 2005 .

[74]  Hector Garcia-Molina,et al.  Extracting structured data from Web pages , 2003, SIGMOD '03.

[75]  Sharad Mehrotra,et al.  WebMARS : A Multimedia Search Engine for Full Document Retrieval and Cross Media Browsing , 2000 .

[76]  Marti A. Hearst Clustering versus faceted categories for information exploration Communications of the , 2006 .

[77]  Corinne Jörgensen,et al.  Image querying by image professionals , 2005, J. Assoc. Inf. Sci. Technol..

[78]  Michael S. Lew Next-Generation Web Searches for Visual Content , 2000, Computer.

[79]  John P. Eakins,et al.  Towards intelligent image retrieval , 2002, Pattern Recognit..

[80]  Ted Pedersen,et al.  WordNet::SenseRelate::AllWords - A Broad Coverage Word Sense Tagger that Maximizes Semantic Relatedness , 2009, NAACL.

[81]  Yang Song,et al.  Tour the world: Building a web-scale landmark recognition engine , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[82]  Mingjing Li,et al.  iFind: a web image search engine , 2001, SIGIR '01.

[83]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[84]  Hsiao-Tieh Pu,et al.  An analysis of failed queries for web image retrieval , 2008, J. Inf. Sci..

[85]  Ted Pedersen,et al.  Extended Gloss Overlaps as a Measure of Semantic Relatedness , 2003, IJCAI.

[86]  Valter Crescenzi,et al.  RoadRunner: Towards Automatic Data Extraction from Large Web Sites , 2001, VLDB.

[87]  Jiebo Luo,et al.  Event recognition: viewing the world with a third eye , 2008, ACM Multimedia.

[88]  Marie-Francine Moens,et al.  Text Analysis for Automatic Image Annotation , 2007, ACL.

[89]  Shih-Fu Chang,et al.  Conceptual framework for indexing visual information at multiple levels , 1999, Electronic Imaging.

[90]  Jing Hua,et al.  Graph theoretical framework for simultaneously integrating visual and textual features for efficient web image clustering , 2008, WWW.

[91]  Ricardo Baeza-Yates,et al.  A Web Search Analysis Considering the Intention behind Queries , 2008, 2008 Latin American Web Conference.

[92]  Dekang Lin,et al.  Automatic Identification of Non-compositional Phrases , 1999, ACL.

[93]  Wei-Ying Ma,et al.  Iteratively clustering web images based on link and attribute reinforcements , 2005, ACM Multimedia.

[94]  Nicolas Tsapatsoulis,et al.  Extraction of Web Image Information: Semantic or Visual Cues? , 2012, AIAI.

[95]  Marcel Worring,et al.  Classification of user image descriptions , 2004, Int. J. Hum. Comput. Stud..

[96]  Peter G. B. Enser,et al.  Analysis of user need in image archives , 1997, J. Inf. Sci..

[97]  Thomas E. Payne Describing Morphosyntax: A Guide for Field Linguists , 1997 .

[98]  Jean-Pierre Chanod,et al.  Robustness beyond shallowness: incremental deep parsing , 2002, Natural Language Engineering.

[99]  Wei Liu,et al.  ViDE: A Vision-Based Approach for Deep Web Data Extraction , 2010, IEEE Transactions on Knowledge and Data Engineering.

[100]  Sam Liu,et al.  Web document text and images extraction using DOM analysis and natural language processing , 2009, DocEng '09.

[101]  Djemel Ziou,et al.  Image Retrieval from the World Wide Web: Issues, Techniques, and Systems , 2004, CSUR.

[102]  Marti A. Hearst Clustering versus faceted categories for information exploration , 2006, Commun. ACM.

[103]  Sougata Mukherjea,et al.  AMORE: a world-wide web image retrieval engine , 1999, CHI Extended Abstracts.

[104]  Kentaro Toyama,et al.  Geographic location tags on digital images , 2003, ACM Multimedia.

[105]  Sanjeev Khudanpur,et al.  Hidden Markov models for automatic annotation and content-based retrieval of images and video , 2005, SIGIR '05.

[106]  Tat-Seng Chua,et al.  A bootstrapping framework for annotating and retrieving WWW images , 2004, MULTIMEDIA '04.

[107]  Arnold W. M. Smeulders,et al.  Color constant ratio gradients for image segmentation and similarity of texture objects , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[108]  Wei-Ying Ma,et al.  Detecting web page structure for adaptive viewing on small form factor devices , 2003, WWW '03.

[109]  Ming-Syan Chen,et al.  WISDOM: Web intrapage informative structure mining based on document object model , 2005, IEEE Transactions on Knowledge and Data Engineering.

[110]  Hanqing Lu,et al.  Semantic knowledge extraction and annotation for web images , 2005, MULTIMEDIA '05.

[111]  Clement T. Yu,et al.  Evaluating strategies and systems for content based indexing of person images on the Web , 2000, ACM Multimedia.

[112]  Steven M. Seitz,et al.  Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[113]  Patrick Gallinari,et al.  Learning to Extract Content from News Webpages , 2009, 2009 International Conference on Advanced Information Networking and Applications Workshops.

[114]  Marti A. Hearst,et al.  Adaptive Multilingual Sentence Boundary Disambiguation , 1997, CL.

[115]  Wei-Ying Ma,et al.  A probabilistic semantic model for image annotation and multi-modal image retrieval , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[116]  Tao Qin,et al.  Web image clustering by consistent utilization of visual features and surrounding texts , 2005, MULTIMEDIA '05.

[117]  Susan T. Dumais,et al.  Improving the retrieval of information from external sources , 1991 .

[118]  Philippe Mulhem,et al.  A Full-Text Framework for the Image Retrieval Signal/Semantic Integration , 2005, DEXA.

[119]  Marie-Francine Moens,et al.  Finding the Best Picture: Cross-Media Retrieval of Content , 2008, ECIR.

[120]  Khaled Shaalan,et al.  A Survey of Web Information Extraction Systems , 2006, IEEE Transactions on Knowledge and Data Engineering.

[121]  C. V. Jawahar,et al.  Multi modal semantic indexing for image retrieval , 2010, CIVR '10.

[122]  Ana Ibáñez Moreno On the categorization of locational expressions: a funtional account , 2004 .

[123]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[124]  Cordelia Schmid,et al.  Semantic Hierarchies for Visual Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[125]  Rada Mihalcea,et al.  Word Sense Disambiguation , 2015, Encyclopedia of Machine Learning.

[126]  Amanda Spink,et al.  Image searching on the Excite Web search engine , 2001, Inf. Process. Manag..

[127]  Hector Garcia-Molina,et al.  Extracting Semistructured Information from the Web. , 1997 .

[128]  Wolfgang Nejdl,et al.  A densitometric approach to web page segmentation , 2008, CIKM '08.

[129]  Corinne Jörgensen,et al.  Image querying by image professionals: Research Articles , 2005 .

[130]  Barbara J. Grosz,et al.  Natural-Language Processing , 1982, Artificial Intelligence.

[131]  Valter Crescenzi,et al.  Grammars Have Exceptions , 1998, Inf. Syst..

[132]  Ximena Olivares,et al.  Boosting image retrieval through aggregating search results based on visual annotations , 2008, ACM Multimedia.