Member activities and quality of tags in a collection of historical photographs in Flickr

There is growing interest in, and an increasing number of attempts by, traditional information providers to engage social content creation and sharing communities in creating and enhancing the metadata of their photo collections to make the collections more accessible and visible. To enable and guide effective metadata creation, however, it is essential to understand the structure and patterns of the activities of the community around the photographs, resources used, and scale and quality of the socially created metadata relative to the metadata and knowledge already encoded in existing knowledge organization systems. This article presents an analysis of Flickr member discussions around the photographs of the Library of Congress photostream in Flickr. The article also reports on an analysis of the intrinsic and relational quality of the photostream tags relative to two knowledge organization systems: the Thesaurus for Graphic Materials and the Library of Congress Subject Headings. Thirty seven percent of the original tag set and 15.3% of the preprocessed set (after the removal of tags with fewer than three characters and URLs) were invalid or misspelled terms. Nouns, named entity terms, and complex terms constituted approximately 77% of the preprocessed set. More than a half of the photostream tags were not found in the TGM and LCSH, and more than a quarter of those terms were regular nouns and noun phrases. This suggests that these terms could be complimentary to more traditional methods of indexing using controlled vocabularies. Introduction Knowledge organization and representation systems (e.g., lists of terms, taxonomies, thesauri, ontologies) traditionally have been essential parts of the information organization and retrieval infrastructure in libraries and museums, and they have now become increasingly important on the Web to support entity and concept identification, semantic annotation, information retrieval, and question answering (e.g., Perez, 2009). Not surprisingly, there has been considerable research on controlled vocabulary and ontology construction, including research identifying quality index terms and on automatic concept and

[1]  Sara Shatford,et al.  Analyzing the Subject of a Picture: A Theoretical Approach , 1986 .

[2]  Jane Greenberg,et al.  Semantic Web Construction: An Inquiry of Authors' Views on Collaborative Metadata Generation , 2002, Dublin Core Conference.

[3]  Eleanor Rosch,et al.  Principles of Categorization , 1978 .

[4]  Hemalata Iyer,et al.  Theories of cognition and image categorization: What category labels reveal about basic level theory , 2008, J. Assoc. Inf. Sci. Technol..

[5]  Lois Mai Chan,et al.  Linking folksonomy to Library of Congress subject headings: an exploratory study , 2009, J. Documentation.

[6]  Mor Naaman,et al.  Why we tag: motivations for annotation in mobile and online media , 2007, CHI.

[7]  Joan E. Beaudoin,et al.  EXPLORING CHARACTERISTICS OF SOCIAL CLASSIFICATION , 2006 .

[8]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[9]  Adam Mathes,et al.  Folksonomies-Cooperative Classification and Communication Through Shared Metadata , 2004 .

[10]  Birger Hjørland What is Knowledge Organization (KO) , 2008 .

[11]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[12]  P. Schmitz,et al.  Inducing Ontology from Flickr Tags , 2006 .

[13]  Michael K. Buckland,et al.  Vocabulary as a Central Concept in Library and Information Science , 1999, CoLIS.

[14]  David N. Milne,et al.  Augmenting Domain-Specific Thesauri with Knowledge from Wikipedia , 2008 .

[15]  John Riedl,et al.  SuggestBot: using intelligent task routing to help people find work in wikipedia , 2007, IUI '07.

[16]  Hsinchun Chen,et al.  Automatic Thesaurus Generation for an Electronic Community System , 1995, J. Am. Soc. Inf. Sci..

[17]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[18]  Thomas J. Sullivan,et al.  Methods of Social Research , 2000 .

[19]  Dagobert Soergel,et al.  Indexing languages and thesauri : construction and maintenance , 1974 .

[20]  Olatz Ansa,et al.  Enriching very large ontologies using the WWW , 2000, ECAI Workshop on Ontology Learning.

[21]  Nina Wacholder,et al.  Assessing term effectiveness in the interactive information access process , 2008, Inf. Process. Manag..

[22]  EunKyung Chung,et al.  Categorical and specificity differences between user-supplied tags and search query terms for images. An analysis of Flickr tags and Web image search queries , 2009, Inf. Res..

[23]  Oded Nov,et al.  Analysis of participation in an online photo-sharing community: A multidimensional perspective , 2010, J. Assoc. Inf. Sci. Technol..

[24]  Brenda Dervin,et al.  From the mind’s eye of the user: The sense-making qualitative-quantitative methodology. , 1992 .

[25]  Jane Greenberg A quantitative categorical analysis of metadata elements in image-applicable metadata schemas , 2001, J. Assoc. Inf. Sci. Technol..

[26]  William R. Hersh,et al.  A task-oriented approach to information retrieval evaluation , 1996 .

[27]  Les Gasser,et al.  Metadata Quality For Federated Collections , 2004, ICIQ.

[28]  Masood Masoodian,et al.  Looking for a picture: an analysis of everyday image information searching , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[29]  Jennifer Trant,et al.  Studying Social Tagging and Folksonomy: A Review and Framework , 2009, J. Digit. Inf..

[30]  Besiki Stvilia,et al.  Value-based metadata quality assessment , 2008 .

[31]  Dongwon Lee,et al.  Search engine driven author disambiguation , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[32]  Diane M. Strong,et al.  Data quality in context , 1997, CACM.

[33]  Kjeld Schmidt,et al.  Taking CSCW Seriously: Supporting Articulation Work * , 1992 .

[34]  D. Cosley,et al.  Using technologies to support reminiscence , 2009 .

[35]  Jane Greenberg,et al.  Optimal query expansion (QE) processing methods with semantically encoded structured thesauri terminology , 2001, J. Assoc. Inf. Sci. Technol..

[36]  J. Tanaka,et al.  Object categories and expertise: Is the basic level in the eye of the beholder? , 1991, Cognitive Psychology.

[37]  K. Bailey Methods of Social Research , 1978 .

[38]  N. Roberts,et al.  Value-added processes in information systems , 1986 .

[39]  Virginia A. Lingle,et al.  Indexing and Abstracting in Theory and Practice , 2005 .

[40]  Gordon Bell,et al.  MyLifeBits: a personal database for everything , 2006, CACM.

[41]  Efthimis N. Efthimiadis,et al.  Interactive query expansion: A user-based evaluation in a relevance feedback environment , 2000, J. Am. Soc. Inf. Sci..

[42]  Judit Bar-Ilan,et al.  The effects of background information and social interaction on image tagging , 2010, J. Assoc. Inf. Sci. Technol..

[43]  K. Weick,et al.  Organizing and the Process of Sensemaking , 2005 .

[44]  Rjoè,et al.  Activity theory as a framework for analyzing and redesigning work , 2005 .

[45]  H. Garfinkel Studies in Ethnomethodology , 1968 .

[46]  Elaine Svenonius,et al.  Unanswered questions in the design of controlled vocabularies , 1986, J. Am. Soc. Inf. Sci..

[47]  Bernardo A. Huberman,et al.  Usage patterns of collaborative tagging systems , 2006, J. Inf. Sci..

[48]  Nina Wacholder,et al.  User preference: A measure of query-term quality , 2006, J. Assoc. Inf. Sci. Technol..

[49]  Terrence A. Brooks All the right descriptors: A test of the strategy of unlimited aliasing , 1993 .

[50]  Walt Scacchi,et al.  Socio-Technical Interaction Networks in Free/Open Source Software Development Processes , 2005 .

[51]  Wayne D. Gray,et al.  Basic objects in natural categories , 1976, Cognitive Psychology.

[52]  Corinne Joergensen,et al.  Testing a vocabulary for image indexing and ground truthing , 2001, IS&T/SPIE Electronic Imaging.

[53]  David C. Blair STAIRS redux: thoughts on the STAIRS evaluation, ten years after , 1996 .

[54]  K. Weick FROM SENSEMAKING IN ORGANIZATIONS , 2021, The New Economic Sociology.

[55]  Les Gasser,et al.  A framework for information quality assessment , 2007, J. Assoc. Inf. Sci. Technol..

[56]  George Lakoff Cognitive Semantics - eScholarship , 1988 .

[57]  Corinne Jörgensen Classifying Images: Criteria for Grouping as Revealed in a Sorting Task , 1995 .

[58]  Jimmy J. Lin,et al.  Computational linguistics for metadata building , 2008, JCDL '08.

[59]  Corinne Jörgensen Image attributes: an investigation , 1995 .

[60]  Besiki Stvilia,et al.  User-generated collection-level metadata in an online photo-sharing system , 2009 .

[61]  Corinne Jörgensen,et al.  Indexing Images: Testing an Image Description Template. , 1996 .

[62]  Les Gasser,et al.  The integration of computing and routine work , 1986, TOIS.

[63]  Les Gasser,et al.  Information quality work organization in wikipedia , 2008, J. Assoc. Inf. Sci. Technol..

[64]  William Croft,et al.  Cognitive Linguistics , 2004 .