Blog Classification Using Tags: An Empirical Study

With an exponential growth of Weblogs (or blogs), many blog directories have appeared to help users to locate topical blogs. As tags are commonly used to describe blogs, we study the effectiveness of tags in blog classification. Compared with titles and descriptions, our experiments, using 24,247 blogs, showed that tags could lead to better classification accuracy. It is interesting to observe that more tags did not necessarily lead to better classification accuracy. To better describe blogs, we have also proposed a tag expansion algorithm that assigns a blog more tags that are often co-occur with those already associated with the blog. Our experiments showed that tag expansion helped to improve the recall of blog classification with the price of precision degradation.

[1]  Dániel Fogaras,et al.  Towards Scaling Fully Personalized PageRank: Algorithms, Lower Bounds, and Experiments , 2005, Internet Math..

[2]  Qiang Yang,et al.  Exploring in the weblog space by detecting informative and affective articles , 2007, WWW '07.

[3]  Paolo Avesani,et al.  An Analysis of the Use of Tags in a Blog Recommender System , 2007, IJCAI.

[4]  Bettina Berendt,et al.  Tags are not metadata, but "just more content" - to some people , 2007, ICWSM.

[5]  Lawrence Birnbaum,et al.  TagAssist: Automatic Tag Suggestion for Blog Posts , 2007, ICWSM.

[6]  Christopher H. Brooks,et al.  Improved annotation of the blogosphere via autotagging and hierarchical clustering , 2006, WWW '06.

[7]  Timothy W. Finin,et al.  SVMs for the Blogosphere: Blog Identification and Splog Detection , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[8]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[9]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[10]  Ramanathan V. Guha,et al.  Information diffusion through blogspace , 2004, WWW '04.

[11]  Ee-Peng Lim,et al.  Web unit mining: finding and classifying subgraphs of web pages , 2003, CIKM '03.

[12]  Mor Naaman,et al.  HT06, tagging paper, taxonomy, Flickr, academic article, to read , 2006, HYPERTEXT '06.

[13]  Ee-Peng Lim,et al.  Web classification using support vector machine , 2002, WIDM '02.

[14]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[15]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.