The accuracy and value of machine-generated image tags: design and user evaluation of an end-to-end image tagging system

Automated image tagging is a problem of great interest, due to the proliferation of photo sharing services. Researchers have achieved considerable advances in understanding motivations and usage of tags, recognizing relevant tags from image content, and leveraging community input to recommend more tags. In this work we address several important issues in building an end-to-end image tagging application, including tagging vocabulary design, taxonomy-based tag refinement, classifier score calibration for effective tag ranking, and selection of valuable tags, rather than just accurate ones. We surveyed users to quantify tag utility and error tolerance, and use this data in both calibrating scores from automatic classifiers and in taxonomy based tag expansion. We also compute the relative importance among tags based on user input and statistics from Flickr. We present an end-to-end system evaluated on thousands of user-contributed photos using 60 popular tags. We can issue four tags per image with over 80% accuracy, up from 50% baseline performance, and we confirm through a comparative user study that value-ranked tags are preferable to accuracy-ranked tags.

[1]  Jon Howell,et al.  Asirra: a CAPTCHA that exploits interest-aligned manual image categorization , 2007, CCS '07.

[2]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[3]  Mor Naaman,et al.  HT06, tagging paper, taxonomy, Flickr, academic article, to read , 2006, HYPERTEXT '06.

[4]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[5]  Bernardo A. Huberman,et al.  Usage patterns of collaborative tagging systems , 2006, J. Inf. Sci..

[6]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[7]  Dong Liu,et al.  Tag ranking , 2009, WWW '09.

[8]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[10]  Douglas B. Lenat,et al.  Mapping Ontologies into Cyc , 2002 .

[11]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[12]  Koen E. A. van de Sande,et al.  Evaluation of color descriptors for object and scene recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[14]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[15]  Wolfgang Nejdl,et al.  Can all tags be used for search? , 2008, CIKM '08.

[16]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[17]  Kilian Q. Weinberger,et al.  Resolving tag ambiguity , 2008, ACM Multimedia.

[18]  Cordelia Schmid,et al.  Semantic Hierarchies for Visual Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[20]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[21]  Rong Yan,et al.  IBM multimedia analysis and retrieval system , 2008, CIVR '08.

[22]  Chin-Hui Lee,et al.  Enhancing image annotation by integrating concept ontology and text-based bayesian learning model , 2007, ACM Multimedia.

[23]  Rong Yan,et al.  Model-shared subspace boosting for multi-label classification , 2007, KDD '07.

[24]  Andrea Esuli,et al.  CoPhIR: a Test Collection for Content-Based Image Retrieval , 2009, ArXiv.

[25]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[26]  James Ze Wang,et al.  Real-Time Computerized Annotation of Pictures , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[28]  Ingmar Weber,et al.  Personalized tag suggestion for flickr , 2008, WWW.

[29]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[30]  Mor Naaman,et al.  Why we tag: motivations for annotation in mobile and online media , 2007, CHI.

[31]  Mor Naaman,et al.  Towards extracting flickr tag semantics , 2007, WWW '07.

[32]  Kilian Q. Weinberger,et al.  Reliable tags using image similarity: mining specificity and expertise from large-scale multimedia databases , 2009, WSMC '09.