Automatic Tag Recommendation for Weblogs

There have been many researches on how to recommend tags for weblogs. In this paper, we propose a novel automatic tag recommendation algorithm, which can be used in the large-scale and real-time data process effectively and efficiently. Most existing researches on tag suggestion focus on firstly mining the relationship between testing and training data and then assigning the top ranked tags of the most related training data to the testing object. However, they ignore the internal relationship between tags and weblogs. According to our research, more than 43% tags, which have been labeled by weblog users, have actually been used in the body of the text. At the meanwhile, the term frequency distribution, the paragraph frequency distribution and the first occurrence position of tags are very different from the ones of non-tags in the text. In this paper, the tags of a weblog are assigned in two steps. First of all, some probability distributions of the word attributes are trained by the labeled training weblogs, and some keywords of a testing weblog are extracted as one part of the tags based on the probability distributions. Then the other part of the tags are obtained from the first part ones with the help of Latent Semantic Indexing (LSI) model. Experiments on a large-scale tagging dataset of weblogs 12 show that the average tagging time for a new weblog is less than 0.02 seconds, and over 74% testing weblogs are correctly labeled with the top 15 tags.

[1]  Andy Hon Wai Chun,et al.  Automatic tag recommendation for the web 2.0 blogosphere using collaborative tagging and hybrid ANN semantic structures , 2007 .

[2]  Jianchang Mao,et al.  Towards the Semantic Web: Collaborative Tag Suggestions , 2006 .

[3]  Siegfried Handschuh,et al.  P-TAG: large scale automatic generation of personalized annotation tags for the web , 2007, WWW '07.

[4]  Grigory Begelman,et al.  Automated Tag Clustering: Improving search and exploration in the tag space , 2006 .

[5]  Yang Song,et al.  Real-time automatic tag recommendation , 2008, SIGIR '08.

[6]  Lawrence Birnbaum,et al.  TagAssist: Automatic Tag Suggestion for Blog Posts , 2007, ICWSM.

[7]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[8]  Paul Resnick,et al.  Recommender systems , 1997, CACM.

[9]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[10]  Gilad Mishne,et al.  AutoTag: a collaborative approach to automated tag assignment for weblog posts , 2006, WWW '06.

[11]  Andreas Hotho,et al.  Tag Recommendations in Folksonomies , 2007, LWA.

[12]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[13]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[14]  Santosh S. Vempala,et al.  Latent Semantic Indexing , 2000, PODS 2000.

[15]  Christopher H. Brooks,et al.  Improved annotation of the blogosphere via autotagging and hierarchical clustering , 2006, WWW '06.