Photo annotation: a survey

Due to the large number of photos that are currently being generated, it is very important to have techniques to organize, search for, and retrieve such images. Photo annotation plays a key role in these mechanisms because it can link raw data (photos) to specific information that is essential for human beings to handle large amounts of content. However, the generation of photo annotation is still a difficult problem to solve as part of a well-known challenge called the semantic gap. In this paper, a literature review was conducted with the aim of investigating the most popular methods employed to produce photo annotations. Based on the papers surveyed, we identified that People (“Who?”), Location (“Where?”), and Event (“Where? When?”) are the most important features of photo annotation. We also established comparisons between similar photo annotation methods, highlighting key aspects of the most commonly used approaches. Moreover, we provide an overview of a general photo annotation process and present the main aspects of photo annotation representation comprising formats, context of usage, advantages and disadvantages. Finally, we discuss ways to improve photo annotation methods and present some future research guidelines.

[1]  Cláudio de Souza Baptista,et al.  New Approaches for Geographic Location Propagation in Digital Photograph Collections , 2014, ICEIS.

[2]  Fotis Psallidas,et al.  Effective Event Identification in Social Media , 2013, IEEE Data Eng. Bull..

[3]  Jiebo Luo,et al.  Inferring photographic location using geotagged web images , 2010, Multimedia Tools and Applications.

[4]  Jiliang Tang,et al.  Mobile Location Prediction in Spatio-Temporal Context , 2012 .

[5]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  W. Wagenaar My memory: A study of autobiographical memory over six years , 1986, Cognitive Psychology.

[7]  Hila Becker,et al.  Identifying content for planned events across social media sites , 2012, WSDM '12.

[8]  Wesley De Neve,et al.  Automatic Face Annotation in Personal Photo Collections Using Context-Based Unsupervised Clustering and Face Information Fusion , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Alan F. Smeaton,et al.  Context-Aware Person Identification in Personal Photo Collections , 2009, IEEE Transactions on Multimedia.

[10]  Cláudio de Souza Baptista,et al.  Towards Better Propagation of Geographic Location in Digital Photo Collections , 2014, IBERAMIA.

[11]  Yaron Kanza,et al.  On the Correlation Between Textual Content and Geospatial Locations in Microblogs , 2014, GeoRich'14.

[12]  Rita Cucchiara,et al.  Learning articulated body models for people re-identification , 2013, MM '13.

[13]  Jay Yagnik,et al.  Learning people annotation from the web via consistency learning , 2007, MIR '07.

[14]  Chengjian Sun,et al.  Deep neural network based image annotation , 2015, Pattern Recognit. Lett..

[15]  Marco A. Casanova,et al.  PhotoGeo: a photo digital library with spatial-temporal support and self-annotation , 2011, Multimedia Tools and Applications.

[16]  Anton Nijholt,et al.  Practices Surrounding Event Photos , 2013, INTERACT.

[17]  Touradj Ebrahimi,et al.  Geotag propagation in social networks based on user trust model , 2010, Multimedia Tools and Applications.

[18]  Cláudio de Souza Baptista,et al.  PhotoGeo: A Self-Organizing System for Personal Photo Collections , 2008, 2008 Tenth IEEE International Symposium on Multimedia.

[19]  Kerry Rodden,et al.  How do people manage their digital photographs? , 2003, CHI '03.

[20]  Gregory D. Abowd,et al.  Towards a Better Understanding of Context and Context-Awareness , 1999, HUC.

[21]  Simon King,et al.  Towards context-aware face recognition , 2005, MULTIMEDIA '05.

[22]  M. Grzegorzek,et al.  K-Space Content Management and Retrieval System , 2007, 14th International Conference of Image Analysis and Processing - Workshops (ICIAPW 2007).

[23]  Meng Wang,et al.  Active learning in multimedia annotation and retrieval: A survey , 2011, TIST.

[24]  Steffen Staab,et al.  M-OntoMat-Annotizer: Image Annotation Linking Ontologies and Multimedia Low-Level Features , 2006, KES.

[25]  Ying He,et al.  Retrieval-Based Face Annotation by Weak Label Regularized Local Coordinate Coding , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Luca-Dan Serbanati,et al.  Using the Surrounding WEB Content of Pictures to Generate Candidates for Photo Annotation , 2013, 2013 19th International Conference on Control Systems and Computer Science.

[27]  Yong Man Ro,et al.  Face annotation for personal photos using context-assisted face recognition , 2008, MIR '08.

[28]  Gang Wang,et al.  Seeing People in Social Context: Recognizing People and Social Relationships , 2010, ECCV.

[29]  Susanne Boll,et al.  Semantic analysis and retrieval in personal and social photo collections , 2010, Multimedia Tools and Applications.

[30]  Marc Davis,et al.  Photo annotation on a camera phone , 2004, CHI EA '04.

[31]  Wesley De Neve,et al.  Collaborative Face Recognition for Improved Face Annotation in Personal Photo Collections Shared on Online Social Networks , 2011, IEEE Transactions on Multimedia.

[32]  Yong Man Ro,et al.  Face annotation for personal photos using collaborative face recognition in online social networks , 2009, 2009 16th International Conference on Digital Signal Processing.

[33]  J. Malpas Place and Experience: A Philosophical Topography , 1999 .

[34]  Bijan Parsia,et al.  PhotoStuff-An Image Annotation Tool for the Semantic Web , 2005 .

[35]  Jérôme Gensel,et al.  Towards the semantic and context-aware management of mobile multimedia , 2010, Multimedia Tools and Applications.

[36]  Jaeyoung Choi,et al.  The Placing Task at MediaEval 2015 , 2015, MediaEval.

[37]  S. Susan Young,et al.  Performance assessment of face recognition using super-resolution , 2010, PerMIS.

[38]  Allan Hanbury,et al.  A survey of methods for image annotation , 2008, J. Vis. Lang. Comput..

[39]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[40]  Mathias Lux,et al.  Caliph & Emir: MPEG-7 photo annotation and retrieval , 2009, ACM Multimedia.

[41]  Mor Naaman,et al.  Leveraging context to resolve identity in photo albums , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[42]  Fausto Giunchiglia,et al.  Social events and social ties , 2013, ICMR '13.

[43]  Ilknur Celik,et al.  Social Event Detection on Twitter , 2012, ICWE.

[44]  Nadjia Benblidia,et al.  Combining Context and Content for Automatic Image Annotation on Mobile Phones , 2013, 2013 International Conference on IT Convergence and Security (ICITCS).

[45]  Liz Wells Photography: A Critical Introduction , 1996 .

[46]  Ebroul Izquierdo,et al.  People recognition using gamified ambiguous feedback , 2014, GamifIR '14.

[47]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[48]  Yiannis Kompatsiaris,et al.  Collaborative event annotation in tagged photo collections , 2012, Multimedia Tools and Applications.

[49]  Benjamin Z. Yao,et al.  Introduction to a Large-Scale General Purpose Ground Truth Database: Methodology, Annotation Tool and Benchmarks , 2007, EMMCVPR.

[50]  Shuxiao Li,et al.  Recognizing and Filtering Web Images Based on People's Existence , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[51]  Henning Müller,et al.  Div400: a social image retrieval result diversification dataset , 2014, MMSys '14.

[52]  Yu Gong,et al.  A Location Prediction Scheme Based on Social Correlation , 2011, 2011 IEEE 73rd Vehicular Technology Conference (VTC Spring).

[53]  D. Harrison,et al.  Minding the gap. , 1989, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[54]  Narendra Ahuja,et al.  Detecting Faces in Images: A Survey , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[55]  Yong Jae Lee,et al.  Face Discovery with Social Context , 2011, BMVC.

[56]  Joo-Hwee Lim,et al.  Latent semantic fusion model for image retrieval and annotation , 2007, CIKM '07.

[57]  Alan F. Smeaton,et al.  User-Feedback on a Feature-Rich Photo Organiser , 2008, 2008 Ninth International Workshop on Image Analysis for Multimedia Interactive Services.

[58]  Mohan S. Kankanhalli,et al.  Real-life events in multimedia: detection, representation, retrieval, and applications , 2013, Multimedia Tools and Applications.

[59]  Abdulmotaleb El-Saddik,et al.  Leveraging personal photos to inferring friendships in social network services , 2012, Expert Syst. Appl..

[60]  Carman Neustaedter,et al.  Image annotation using personal calendars as context , 2008, ACM Multimedia.

[61]  Dragomir Anguelov,et al.  Contextual Identity Recognition in Personal Photo Albums , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  Vincent Despiegel,et al.  Semi-supervised evaluation of face recognition in videos , 2013, VIGTA@ICVS.

[63]  Alice J. O'Toole,et al.  Comparing face recognition algorithms to humans on challenging tasks , 2012, TAP.

[64]  Ebroul Izquierdo,et al.  Social event detection and retrieval in collaborative photo collections , 2012, ICMR '12.

[65]  Mor Naaman,et al.  Unfolding the event landscape on twitter: classification and exploration of user categories , 2012, CSCW '12.

[66]  Gang Hua,et al.  Joint People, Event, and Location Recognition in Personal Photo Collections Using Cross-Domain Context , 2010, ECCV.

[67]  Fabio Roli,et al.  Appearance-based people recognition by local dissimilarity representations , 2012, MM&Sec '12.

[68]  Wei Zhang,et al.  Clothing-based person clustering in family photos , 2010, 2010 IEEE International Conference on Image Processing.

[69]  Alan F. Smeaton,et al.  Life-Long Collections: Motivations and the Implications for Lifelogging with Mobile Devices , 2014, Int. J. Mob. Hum. Comput. Interact..

[70]  Jintao Li,et al.  GeSoDeck: a geo-social event detection and tracking system , 2013, MM '13.

[71]  Eric Medvet,et al.  Automatic Face Annotation in News Images by Mining the Web , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[72]  Jing Zhu,et al.  Ontology-based digital photo annotation using multi-source information , 2009, 2009 IEEE International Conference on Computational Intelligence for Measurement Systems and Applications.

[73]  Simon King,et al.  From context to content: leveraging context to infer media metadata , 2004, MULTIMEDIA '04.

[74]  Mor Naaman,et al.  Over-exposed?: privacy patterns and considerations in online and mobile photo sharing , 2007, CHI.

[75]  Joo-Hwee Lim,et al.  Home Photo Content Modeling for Personalized Event-Based Retrieval , 2003, IEEE Multim..

[76]  Noel E. O'Connor,et al.  Enhancing Person Annotation for Personal Photo Management Applications , 2009, 2009 20th International Workshop on Database and Expert Systems Application.

[77]  Tong Zhang,et al.  Clothes search in consumer photos via color matching and attribute learning , 2011, ACM Multimedia.

[78]  Fabio Ciravegna,et al.  Cross-media document annotation and enrichment , 2006, SAAW@ISWC.

[79]  Benjamin B. Bederson,et al.  Semi-automatic photo annotation strategies using event based clustering and clothing based person recognition , 2007, Interact. Comput..

[80]  Tsuhan Chen,et al.  Using Context to Recognize People in Consumer Images , 2009, IPSJ Trans. Comput. Vis. Appl..

[81]  Ebroul Izquierdo,et al.  MediaEval 2013: Social Event Detection, Retrieval and Classification in Collaborative Photo Collections , 2013, MediaEval.

[82]  Hila Becker,et al.  Selecting Quality Twitter Content for Events , 2011, ICWSM.

[83]  A. Smeaton,et al.  IDENTIFYING PERSON RE-OCCURRENCES FOR PERSONAL PHOTO MANAGEMENT APPLICATIONS , 2006 .

[84]  Sourav S. Bhowmick,et al.  In search of influential event organizers in online social networks , 2014, SIGMOD Conference.

[85]  Ross Purves,et al.  Exploring place through user-generated content: Using Flickr tags to describe city cores , 2010, J. Spatial Inf. Sci..

[86]  Susanne Boll,et al.  Analysing Facebook features to support event detection for photo-based Facebook applications , 2012, ICMR '12.

[87]  Trevor Darrell,et al.  Toward Large-Scale Face Recognition Using Social Network Context , 2010, Proceedings of the IEEE.

[88]  Ronny Lempel,et al.  Lightweight automatic face annotation in media pages , 2012, WWW.

[89]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[90]  Francesco G. B. De Natale,et al.  Jointly exploiting visual and non-visual information for event-related social media retrieval , 2013, ICMR '13.

[91]  Fergal Monaghan,et al.  Leveraging Ontologies, Context and Social Networks to Automate Photo Annotation , 2007, SAMT.

[92]  Md. Monirul Islam,et al.  A review on automatic image annotation techniques , 2012, Pattern Recognit..

[93]  Rik Van de Walle,et al.  Enabling context-aware multimedia annotation by a novel generic semantic problem-solving platform , 2012, Multimedia Tools and Applications.

[94]  Alan F. Smeaton,et al.  Using text search for personal photo collections with the MediAssist system , 2007, SAC '07.

[95]  José Luis Borbinha,et al.  Extracting and Exploring the Geo-Temporal Semantics of Textual Resources , 2008, 2008 IEEE International Conference on Semantic Computing.

[96]  Evaggelos Spyrou,et al.  Analyzing Flickr metadata to extract location-based information and semantically organize its photo content , 2016, Neurocomputing.

[97]  Annika Hinze,et al.  The digital parrot: Combining context-awareness and semantics to augment memory , 2007 .

[98]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[99]  Yiannis Kompatsiaris,et al.  A Survey of Semantic Image and Video Annotation Tools , 2011, Knowledge-Driven Multimedia Information Extraction and Ontology Evolution.

[100]  Keiji Yanai,et al.  Visualization of Real-World Events with Geotagged Tweet Photos , 2012, 2012 IEEE International Conference on Multimedia and Expo Workshops.

[101]  Motorola Labs , 2022 .

[102]  Bob J. Wielinga,et al.  Ontology-Based Photo Annotation , 2001, IEEE Intell. Syst..

[103]  Fred Stentiford,et al.  Using context and similarity for face and location identification , 2006, Electronic Imaging.

[104]  Dan Brickley,et al.  SWAD-Europe: Semantic Web Advanced Development in Europe , 2002, SEMWEB.

[105]  Mor Naaman,et al.  Context data in geo-referenced digital photo collections , 2004, MULTIMEDIA '04.

[106]  R. J. Hulsebosch,et al.  Enhancing Face Recognition with Location Information , 2008, 2008 Third International Conference on Availability, Reliability and Security.

[107]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[108]  Azriel Rosenfeld,et al.  Face recognition: A literature survey , 2003, CSUR.

[109]  Tsuhan Chen,et al.  Clothing cosegmentation for recognizing people , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[110]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.