Resolving tag ambiguity

Tagging is an important way for users to succinctly describe the content they upload to the Internet. However, most tag-suggestion systems recommend words that are highly correlated with the existing tag set, and thus add little information to a user's contribution. This paper describes a means to determine the ambiguity of a set of (user-contributed) tags and suggests new tags that disambiguate the original tags. We introduce a probabilistic framework that allows us to find two tags that appear in different contexts but are both likely to co-occur with the original tag set. If such tags can be found, the current description is considered "ambiguous" and the two tags are recommended to the user for further clarification. In contrast to previous work, we only query the user when information is most needed and good suggestions are available. We verify the efficacy of our approach using geographical, temporal and semantic metadata, and a user study. We built our system using statistics from a large (100M) database of images and their tags.

[1]  Claudio Carpineto,et al.  An information-theoretic approach to automatic query expansion , 2001, TOIS.

[2]  Stefan Rüger,et al.  NNk networks and automated annotation for browsing large image collections from the world wide web , 2006, MM '06.

[3]  References , 1971 .

[4]  Bernardo A. Huberman,et al.  Usage patterns of collaborative tagging systems , 2006, J. Inf. Sci..

[5]  Gilad Mishne,et al.  AutoTag: a collaborative approach to automated tag assignment for weblog posts , 2006, WWW '06.

[6]  Mor Naaman,et al.  World explorer: visualizing aggregate data from unstructured text in geo-referenced collections , 2007, JCDL '07.

[7]  Changhu Wang,et al.  Image annotation refinement using random walk with restarts , 2006, MM '06.

[8]  Mor Naaman,et al.  Why we tag: motivations for annotation in mobile and online media , 2007, CHI.

[9]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[10]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[11]  Mor Naaman,et al.  HT06, tagging paper, taxonomy, Flickr, academic article, to read , 2006, HYPERTEXT '06.

[12]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[13]  Jianchang Mao,et al.  Towards the Semantic Web: Collaborative Tag Suggestions , 2006 .

[14]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[15]  Iadh Ounis,et al.  Query performance prediction , 2006, Inf. Syst..

[16]  Ravi Kumar,et al.  Visualizing tags over time , 2006, WWW '06.

[17]  Qi Zhang,et al.  Automatic image annotation by an iterative approach: incorporating keyword correlations and region matching , 2007, CIVR '07.

[18]  W. Bruce Croft,et al.  Predicting query performance , 2002, SIGIR '02.

[19]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[20]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[21]  Claudio Carpineto,et al.  Query Difficulty, Robustness, and Selective Application of Query Expansion , 2004, ECIR.