An online blog reading system by topic clustering and personalized ranking

There is an increasing number of people reading, writing, and commenting on blogs. According to a recent survey made by Technorati, there are about 75,000 new blogs and 1.2 million new posts everyday. However, it is difficult and time consuming for a blog reader to find the most interesting posts in the huge and dynamic blog world. In this article, an online Personalized Blog Reader (PBR) system is proposed, which facilitates blog readers in browsing the coolest and newest blog posts of their interests by automatically clustering the most relevant stories. PBR aims to make a user's potential favorite topics always ranked higher than those nonfavorite ones. This is accomplished in the following steps. First, the system collects and provides a unified incremental index of posts coming from different blogs. Then, an incremental clustering algorithm with a flexible half-bounded window of observation is proposed to satisfy the requirements of online processing. It learns people's personalized reading preferences to present a user with a final reading list. The experimental results show that the proposed incremental clustering algorithm is effective and efficient, and the personalization of the PBR performs well.

[1]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[2]  Eytan Adar,et al.  Implicit Structure and the Dynamics of Blogspace , 2004 .

[3]  Richard M. Leahy,et al.  An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[5]  Jon M. Kleinberg,et al.  Inferring Web communities from link topology , 1998, HYPERTEXT '98.

[6]  Steve Cayzer,et al.  Semantic blogging and decentralized knowledge management , 2004, CACM.

[7]  Andrew McCallum,et al.  Distributional clustering of words for text classification , 1998, SIGIR '98.

[8]  Monica Bonett Personalization of Web Services: Opportunities and Challenges , 2001 .

[9]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[10]  Christopher H. Brooks,et al.  An Analysis of the Effectiveness of Tagging in Blogs , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[11]  Tse-Ming Tsai,et al.  Personalized Blog Recommendation Using the Value, Semantic, and Social Model , 2006, 2006 Innovations in Information Technology.

[12]  Clement T. Yu,et al.  Personalized Web search for improving retrieval effectiveness , 2004, IEEE Transactions on Knowledge and Data Engineering.

[13]  M. Singhal Automatic Text Browsing Using Vector Space , 1995 .

[14]  Vibhu O. Mittal,et al.  Stemming and its effects on TFIDF ranking (poster session) , 2000, SIGIR '00.

[15]  Thorsten Brants,et al.  A System for new event detection , 2003, SIGIR.

[16]  John Riedl,et al.  Recommender Systems for Large-scale E-Commerce : Scalable Neighborhood Formation Using Clustering , 2002 .

[17]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[18]  Paolo Ferragina,et al.  A personalized search engine based on Web-snippet hierarchical clustering , 2008 .

[19]  Conor Hayes Paolo Avesani Sriharsha Veeramachaneni An Analysis of Bloggers and Topics for a Blog Recommender System , 2006 .

[20]  Avrim Blum,et al.  Correlation Clustering , 2004, Machine Learning.

[21]  D. Eppstein,et al.  Approximation algorithms for geometric problems , 1996 .

[22]  Inna Kouper,et al.  Conversations in the Blogosphere: An Analysis "From the Bottom Up" , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[23]  Feng Qiu,et al.  Automatic identification of user interest for personalized search , 2006, WWW '06.

[24]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[25]  Paolo Avesani,et al.  Learning Contextualised Weblog Topics , 2005 .

[26]  Qiang Yang,et al.  Diverse Topic Phrase Extraction through Latent Semantic Analysis , 2006, Sixth International Conference on Data Mining (ICDM'06).

[27]  Paolo Avesani,et al.  An Analysis of the Use of Tags in a Blog Recommender System , 2007, IJCAI.

[28]  David R. Karger,et al.  What would it mean to blog on the semantic web? , 2005, J. Web Semant..

[29]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[30]  Herbert Gish,et al.  Clustering speakers by their voices , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[31]  Aaron Delwiche,et al.  Agenda-setting, opinion leadership, and the world of Web logs , 2005, First Monday.

[32]  Weiguo Fan,et al.  Genetic Programming-Based Discovery of Ranking Functions for Effective Web Search , 2005, J. Manag. Inf. Syst..

[33]  Weiguo Fan,et al.  Discovery of context-specific ranking functions for effective information retrieval using genetic programming , 2004, IEEE Transactions on Knowledge and Data Engineering.

[34]  Venkatesan Guruswami,et al.  Correlation clustering with a fixed number of clusters , 2005, SODA '06.

[35]  Dorit S. Hochbaum,et al.  Approximation Algorithms for NP-Hard Problems , 1996 .

[36]  Ravi Kumar,et al.  On the Bursty Evolution of Blogspace , 2003, WWW '03.

[37]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[38]  Vibhu O. Mittal,et al.  Stemming and its effects on TFIDF ranking. , 2000, SIGIR 2000.

[39]  Emden R. Gansner,et al.  Using automatic clustering to produce high-level system organizations of source code , 1998, Proceedings. 6th International Workshop on Program Comprehension. IWPC'98 (Cat. No.98TB100242).

[40]  Cameron A. Marlow Audience, structure and authority in the weblog community , 2004 .

[41]  George Karypis,et al.  Multilevel k-way Partitioning Scheme for Irregular Graphs , 1998, J. Parallel Distributed Comput..

[42]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[43]  Yiming Yang,et al.  A study of retrospective and on-line event detection , 1998, SIGIR '98.

[44]  Paolo Ferragina,et al.  A personalized search engine based on Web‐snippet hierarchical clustering , 2005, WWW '05.

[45]  Nick Koudas,et al.  BlogScope: A System for Online Analysis of High Volume Text Streams , 2007, VLDB.

[46]  Frank Wm. Tompa,et al.  Seeking Stable Clusters in the Blogosphere , 2007, VLDB.

[47]  Weiguo Fan,et al.  Adaptive Web Search: Evolving a Program That Finds Information , 2006, IEEE Intelligent Systems.

[48]  Yoav Shoham,et al.  Fab: content-based, collaborative recommendation , 1997, CACM.

[49]  Derek G. Bridge,et al.  An Accurate and Scalable Collaborative Recommender , 2004, Artificial Intelligence Review.