Using Tags and Clustering to Identify Topic-Relevant Blogs

The Web has experienced an exponential growth in the use of weblogs or blogs. Blog entries are generally organised using tags, informally defined labels which are increasingly being proposed as a ‘grassroots’ answer to Semantic Web standards. Despite this, tags have been shown to be weak at partitioning blog data. In this paper, we demonstrate how tags provide useful, discriminating information where the blog corpus is initially partitioned using a conventional clustering technique. Using extensive empirical evaluation we demonstrate how tag cloud information within each cluster allows us to identify the most topic-relevant blogs in the cluster. We conclude that tags have a key auxiliary role in refining and confirming the information produced using typical knowledge discovery techniques.

[1]  Inderjit S. Dhillon,et al.  Efficient Clustering of Very Large Document Collections , 2001 .

[2]  George Karypis,et al.  Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering , 2004, Machine Learning.

[3]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[4]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[5]  Christopher H. Brooks,et al.  An Analysis of the Effectiveness of Tagging in Blogs , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[6]  Derek G. Bridge,et al.  An Accurate and Scalable Collaborative Recommender , 2004, Artificial Intelligence Review.

[7]  Steve Cayzer,et al.  Semantic blogging and decentralized knowledge management , 2004, CACM.

[8]  Inna Kouper,et al.  Conversations in the Blogosphere: An Analysis "From the Bottom Up" , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[9]  Paolo Avesani,et al.  An Analysis of the Use of Tags in a Blog Recommender System , 2007, IJCAI.

[10]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[11]  Bernardo A. Huberman,et al.  The Structure of Collaborative Tagging Systems , 2005, ArXiv.

[12]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[13]  John Riedl,et al.  Recommender Systems for Large-scale E-Commerce : Scalable Neighborhood Formation Using Clustering , 2002 .

[14]  Conor Hayes Paolo Avesani Sriharsha Veeramachaneni An Analysis of Bloggers and Topics for a Blog Recommender System , 2006 .

[15]  Oren Etzioni,et al.  Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.

[16]  David R. Karger,et al.  What would it mean to blog on the semantic web? , 2005, J. Web Semant..

[17]  M. Schreurs From the Bottom Up , 2008 .

[18]  Jonathan L. Herlocker,et al.  Clustering items for collaborative filtering , 1999 .