Hybrid Approach for Construction of Summaries and Clusters of Blog Data for Improved Blog Search Results

The data are noisy in blogs, because blog entries are unstructured and might cover a wide variety of topics. However, because of the number of blogs exist, manually viewing and examining them is a difficult and time-consuming task. Intuitively, you could apply existing text and Web mining techniques to blog analysis and mining. But, because of the existence of various challenges, we can’t directly apply these techniques. The bloggers update their information content on the blogs much more frequently than Web masters update traditional Web pages, often daily or even hourly. Above all, bloggers cover very diverse topics, so maybe only one paragraph in a particular entry could relate to someone’s topic of interest. In this paper we propose an architecture which takes a query from the user, process through blog parser and extract content from the blog page. Then we identify the sentences which should be taken for further processing using a blog analyzer and finally summarizing the content based on the analysis results. The process is repeated for all the blogs and results in the summarized output of clustered blogs.

[1]  Jia Li,et al.  Extracting Author Meta-Data from Web Using Visual Features , 2007 .

[2]  Beibei Li,et al.  Enhancing clustering blog documents by utilizing author/reader comments , 2007, ACM-SE 45.

[3]  Mohamed S. Kamel,et al.  Enhanced bisecting k-means clustering using intermediate cooperation , 2009, Pattern Recognit..

[4]  O. Sornil,et al.  An Automatic Text Summarization Approach using Content-Based and Graph-Based Characteristics , 2006, 2006 IEEE Conference on Cybernetics and Intelligent Systems.

[5]  Martin Oberhofer,et al.  Knowledge Discovery in the Blogosphere: Approaches and Challenges , 2010, IEEE Internet Computing.

[6]  Niladri Chatterjee,et al.  Ranking products through interpretation of blogs based on users' query , 2009, 2009 Proceeding of International Conference on Methods and Models in Computer Science (ICM2CS).

[7]  Yunming Ye,et al.  Improved blog clustering through automated weighting of text blocks , 2009, 2009 International Conference on Machine Learning and Cybernetics.

[8]  Jennifer Jie Xu,et al.  A Blog Mining Framework , 2009, IT Professional.

[9]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[10]  Rasim M. Alguliyev,et al.  Effective summarization method of text documents , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[11]  M. B. Chandak,et al.  Graph-Based Algorithms for Text Summarization , 2010, 2010 3rd International Conference on Emerging Trends in Engineering and Technology.

[12]  Te-Ming Chang,et al.  A hybrid approach to automatic text summarization , 2008, 2008 8th IEEE International Conference on Computer and Information Technology.

[13]  Shaozi Li,et al.  Mining relation between the blogger and query in blog retrieval system , 2008, 2008 3rd International Conference on Intelligent System and Knowledge Engineering.

[14]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.