Utilizing Wikipedia as a Knowledge Source in Categorizing Topic related Korean Blogs into Facets

As blog services and blog tools are becoming more and more popular, people have been able to express one’s own interests as well as opinions on the Web. Search engines are then used for accessing various information that can be found in the blogosphere, where, given a search query, a ranked list of blog posts is provided as a search result. However, such a search result in the form of a ranked list is not usually helpful for a user to quickly identify blog posts that satisfy his/her information need. This is especially true when, given a search query, the search result is a mixture of blog posts that focus on various sub-topics. In such a situation, the framework of faceted search [8], which has been well studied in the information retrieval community, can be a solution. In this paper, we propose a framework of categorizing Korean blog posts according to their sub-topics, where, given a search query, those blog posts are collected from the Korean blogosphere. In our framework, the sub-topic of each blog post is regarded as a facet of an initial topic keyword, and a facet is automatically assigned to each blog post. For example, Figure 1 illustrates a result of faceted search for an initial topic keyword “global warming” within the Korean blogosphere. In this result, a number of collected blog posts regarding “global warming” are categorized into facets by identifying each blogger’s interest in a blog post. This procedure of assigning a facet to a blog post is realized by utilizing Wikipedia entries as a knowledge source and each Wikipedia entry title is considered as a facet label. In the evaluation, we can achieve about 50∼70 % accuracy.

[1]  Daniel Tunkelang,et al.  Faceted Search , 2009, Synthesis Lectures on Information Concepts, Retrieval, and Services.

[2]  Craig MacDonald,et al.  Overview of the TREC 2009 Blog Track , 2009, TREC.

[3]  Gautam Das,et al.  Facetedpedia: dynamic generation of query-dependent faceted interfaces for wikipedia , 2010, WWW '10.

[4]  Sadao Kurohashi,et al.  Summarizing Search Results using PLSI , 2010 .

[5]  Ryoji Kataoka,et al.  Search Result Clustering Using Informatively Named Entities , 2007, Int. J. Hum. Comput. Interact..

[6]  K. Fujimura,et al.  BLOGRANGER – A Multi-faceted Blog Search Engine , 2006 .

[7]  Sadao Kurohashi,et al.  Web Information Organization Using Keyword Distillation Based Clustering , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.