Towards automatic extraction of event and place semantics from flickr tags

We describe an approach for extracting semantics of tags, unstructured text-labels assigned to resources on the Web, based on each tag's usage patterns. In particular, we focus on the problem of extracting place and event semantics for tags that are assigned to photos on Flickr, a popular photo sharing website that supports time and location (latitude/longitude) metadata. We analyze two methods inspired by well-known burst-analysis techniques and one novel method: Scale-structure Identification. We evaluate the methods on a subset of Flickr data, and show that our Scale-structure Identification method outperforms the existing techniques. The approach and methods described in this work can be used in other domains such as geo-annotated web pages, where text terms can be extracted and associated with usage patterns.

[1]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .

[2]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[3]  George E. P. Box,et al.  Time Series Analysis: Forecasting and Control , 1977 .

[4]  D. McDowall,et al.  Interrupted Time Series Analysis , 1980 .

[5]  Andrew P. Witkin,et al.  Scale-Space Filtering , 1983, IJCAI.

[6]  Brian Parker,et al.  Quantitative Applications in the Social Sciences , 1983 .

[7]  M. Kulldorff Spatial Scan Statistics: Models, Calculations, and Applications , 1999 .

[8]  Jaideep Srivastava,et al.  Event detection from time series data , 1999, KDD '99.

[9]  Luis Gravano,et al.  Exploiting Geographical Location Information of Web Pages , 1999, WebDB.

[10]  Luis Gravano,et al.  Computing Geographical Scopes of Web Resources , 2000, VLDB.

[11]  Jeffrey M. Zacks,et al.  Event structure in perception and conception. , 2001, Psychological bulletin.

[12]  Alexander C. Loui,et al.  Using event segmentation to improve indexing of consumer photographs , 2001, SIGIR '01.

[13]  Andreas Paepcke,et al.  Time as essence for photo browsing through personal digital libraries , 2002, JCDL '02.

[14]  Thomas Hofmann,et al.  Text categorization by boosting automatically extracted concepts , 2003, SIGIR.

[15]  Mor Naaman,et al.  From Where to What: Metadata Sharing for Digital Photographs with Geographic Coordinates , 2003, OTM.

[16]  Ron Sivan,et al.  Web-a-where: geotagging web content , 2004, SIGIR '04.

[17]  Dimitrios Gunopulos,et al.  Identifying similarities, periodicities and bursts for online search queries , 2004, SIGMOD '04.

[18]  Mor Naaman,et al.  Automatic organization for digital photographs with geographic coordinates , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[19]  Dick C. A. Bulterman,et al.  Is It Time for a Moratorium on Metadata? , 2004, IEEE Multim..

[20]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[21]  Marc Gelgon,et al.  Organizing a personal image collection with statistical model-based ICL clustering on spatio-temporal camera phone meta-data , 2004, Journal of Visual Communication and Image Representation.

[22]  Paul Clough,et al.  Identifying imprecise regions for geographic information retrieval using the web , 2005 .

[23]  Mor Naaman,et al.  Generating summaries and visualization for large collections of geo-referenced photographs , 2006, MIR '06.

[24]  Bernardo A. Huberman,et al.  Usage patterns of collaborative tagging systems , 2006, J. Inf. Sci..

[25]  Ravi Kumar,et al.  Visualizing tags over time , 2006, WWW '06.

[26]  Avi Arampatzis,et al.  Web-based delineation of imprecise regions , 2006, Comput. Environ. Urban Syst..

[27]  P. Schmitz,et al.  Inducing Ontology from Flickr Tags , 2006 .

[28]  Allison Woodruff,et al.  A Quantitative Method for Revealing and Comparing Places in the Home , 2006, UbiComp.

[29]  Mor Naaman,et al.  HT06, tagging paper, taxonomy, Flickr, academic article, to read , 2006, HYPERTEXT '06.