Ontology-Based Topic Extraction Service from Weblogs

Consumer Generated Media (CGM) has a significant impact on on companies' product marketing strategies. This paper illustrates development of opinion analysis service for marketing research, WOM Scouter. Then we present an algorithm of associated topic extraction, which is one of main features in WOM Scouter. Associated topic extraction finds out competitive products from blog entries commenting on a specified product. The main feature is to apply product ontology in addition to natural language processing. By looking up a term on product ontology, the product domain is identified in blog entries, and general nouns are excluded. Another feature is to evaluate an importance of each product by means of two kinds of smoothing functions based on link popularity and document frequency. The experimental evaluation shows that the proposed algorithm is closer to blog readers' impression than TF-IDF.

[1]  Gilad Mishne,et al.  A Study of Blog Search , 2006, ECIR.

[2]  Ryoji Kataoka,et al.  A search result clustering method using informatively named entities , 2005, WIDM '05.

[3]  Pier Luca Lanzi,et al.  Mining interesting knowledge from weblogs: a survey , 2005, Data Knowl. Eng..

[4]  Mitsuru Ishizuka,et al.  Topic extraction from news archive using TF*PDF algorithm , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002..

[5]  Takahiro Kawamura,et al.  Extraction of Topical Consumer Products from Weblogs , 2008, ICWSM.

[6]  Kuan-Yu Chen,et al.  Hot Topic Extraction Based on Timeline Analysis and Multidimensional Sentence Modeling , 2007, IEEE Transactions on Knowledge and Data Engineering.

[7]  Takahiro Kawamura,et al.  Mobile Service for Reputation Extraction from Weblogs - Public Experiment and Evaluation , 2007, AAAI.