Automatic annotation of image databases based on implicit crowdsourcing, visual concept modeling and evolution

In this paper a novel approach for automatically annotating image databases is proposed. Despite most current schemes that are just based on spatial content analysis, the proposed method properly combines several innovative modules for semantically annotating images. In particular it includes: (a) a GWAP-oriented interface for optimized collection of implicit crowdsourcing data, (b) a new unsupervised visual concept modeling algorithm for content description and (c) a hierarchical visual content display method for easy data navigation, based on graph partitioning. The proposed scheme can be easily adopted by any multimedia search engine, providing an intelligent way to even annotate completely non-annotated content or correct wrongly annotated images. The proposed approach currently provides very interesting results in limited-size both standard and generic datasets and it is expected to add significant value especially to billions of non-annotated images existing in the Web. Furthermore expert annotators can gain important knowledge relevant to user new trends, language idioms and styles of searching.

[1]  Chin-Hui Lee,et al.  Automatic Image Annotation through Multi-Topic Text Categorization , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[2]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Nikolaos D. Doulamis,et al.  Evaluation of relevance feedback schemes in content-based in retrieval systems , 2006, Signal Process. Image Commun..

[4]  Kristen Grauman,et al.  What's it going to cost you?: Predicting effort vs. informativeness for multi-label image annotations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Jeff Howe,et al.  Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business , 2008, Human Resource Management International Digest.

[6]  Ioannis G. Nikolakopoulos,et al.  An evaluation study of clustering algorithms in the scope of user communities assessment , 2009, Comput. Math. Appl..

[7]  Sergios Petridis,et al.  Classifying Images from Athletics Based on Spatial Relations , 2007 .

[8]  Manuel Blum,et al.  reCAPTCHA: Human-Based Character Recognition via Web Security Measures , 2008, Science.

[9]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[10]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[11]  Irwin King,et al.  A Survey of Human Computation Systems , 2009, 2009 International Conference on Computational Science and Engineering.

[12]  Nikolaos F. Matsatsinis,et al.  Optimizing Resource Conflicts in Workflow Management Systems , 2011, IEEE Transactions on Knowledge and Data Engineering.

[13]  Pietro Perona,et al.  Inferring Ground Truth from Subjective Labelling of Venus Images , 1994, NIPS.

[14]  Christos Diou,et al.  Image annotation using clickthrough data , 2009, CIVR '09.

[15]  Laura A. Dabbish,et al.  Designing games with a purpose , 2008, CACM.

[16]  Wei-Ying Ma,et al.  Exploring statistical correlations for image retrieval , 2006, Multimedia Systems.

[17]  Alberto Del Bimbo,et al.  Taking into Consideration Sports Semantic Annotation of Sports Videos Content-based Multimedia Indexing and Retrieval , 2002 .

[18]  Gerardo Hermosillo,et al.  Supervised learning from multiple experts: whom to trust when everyone lies a bit , 2009, ICML '09.

[19]  Markus Koch,et al.  Learning TRECVID'08 High-Level Features from YouTube , 2008, TRECVID.

[20]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[21]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[22]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[23]  Yihong Gong,et al.  Incremental spectral clustering by efficiently updating the eigen-system , 2010, Pattern Recognit..

[24]  Thorsten Joachims,et al.  Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[25]  Ed H. Chi,et al.  The singularity is not near: slowing growth of Wikipedia , 2009, Int. Sym. Wikis.

[26]  David G. Stork,et al.  Pattern Classification , 1973 .

[27]  Jitendra Malik,et al.  Normalized Cut and Image Segmentation , 1997 .

[28]  Marti A. Hearst,et al.  Improving Search Results Quality by Customizing Summary Lengths , 2008, ACL.

[29]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[30]  Dong Liu,et al.  Image Retagging Using Collaborative Tag Propagation , 2011, IEEE Transactions on Multimedia.

[31]  Pietro Perona,et al.  Some Objects Are More Equal Than Others: Measuring and Predicting Importance , 2008, ECCV.

[32]  Nicolas Tsapatsoulis,et al.  Human action analysis, annotation and modeling in video streams based on implicit user interaction , 2008, AREA '08.

[33]  Min-Yen Kan,et al.  Perspectives on crowdsourcing annotations for natural language processing , 2012, Language Resources and Evaluation.

[34]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[35]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Dimitri Plemenos,et al.  Machine Learning and Pattern Analysis Methods for Profiling in a Declarative Collaorative Framework , 2009 .

[37]  Thierry Pun,et al.  Learning from User Behavior in Image Retrieval: Application of Market Basket Analysis , 2004, International Journal of Computer Vision.

[38]  Vincent S. Tseng,et al.  Integrated Mining of Visual Features, Speech Features, and Frequent Patterns for Semantic Video Annotation , 2008, IEEE Transactions on Multimedia.

[39]  Martin Hepp,et al.  Games with a Purpose for the Semantic Web , 2008, IEEE Intelligent Systems.

[40]  Yannis Avrithis,et al.  Bottom-up spatiotemporal visual attention model for video analysis , 2007 .

[41]  Nicolas Tsapatsoulis,et al.  Classifying Images from Athletics Based on Spatial Relations , 2007, Second International Workshop on Semantic Media Adaptation and Personalization (SMAP 2007).

[42]  Stefanos D. Kollias,et al.  Non-sequential video content representation using temporal variation of feature vectors , 2000, 2000 Digest of Technical Papers. International Conference on Consumer Electronics. Nineteenth in the Series (Cat. No.00CH37102).

[43]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[44]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[45]  Steffen Staab,et al.  Knowledge Representation for Semantic Multimedia Content Analysis and Reasoning , 2004, EWIMT.

[46]  Vikas Sindhwani,et al.  Data Quality from Crowdsourcing: A Study of Annotation Selection Criteria , 2009, HLT-NAACL 2009.

[47]  Feng Dong,et al.  Shape from Shading Using Wavelets , 2007 .

[48]  Wei-Ying Ma,et al.  Learning a semantic space from user's relevance feedback for image retrieval , 2003, IEEE Trans. Circuits Syst. Video Technol..

[49]  Yong Yu,et al.  Using Probabilistic Latent Semantic Analysis for Personalized Web Search , 2005, APWeb.

[50]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[51]  Ted S. Sindlinger,et al.  Crowdsourcing: Why the Power of the Crowd is Driving the Future of Business , 2010 .

[52]  C. Sunstein Infotopia: How Many Minds Produce Knowledge , 2006 .

[53]  David C. Parkes,et al.  The role of game theory in human computation systems , 2009, HCOMP '09.

[54]  Douglas R. Heisterkamp Building a latent semantic index of an image database from patterns of relevance feedback , 2002, Object recognition supported by user interaction for service robots.

[55]  Craig MacDonald,et al.  Usefulness of quality click-through data for training , 2009, WSCD '09.

[56]  Nicolas Tsapatsoulis,et al.  Human action annotation, modeling and analysis based on implicit user interaction , 2010, Multimedia Tools and Applications.

[57]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[58]  Michael I. Jordan,et al.  Learning Spectral Clustering , 2003, NIPS.

[59]  Vipin Kumar,et al.  Analysis of Multilevel Graph Partitioning , 1995, Proceedings of the IEEE/ACM SC95 Conference.