CCRM: An Effective Algorithm for Mining Commodity Information from Threaded Chinese Customer Reviews

This paper is concerned with the problem of mining commodity information from threaded Chinese customer reviews. Chinese online commodity forums, which are developing rapidly, provide a good environment for customers to share reviews. However, due to noises and navigational limitations, it is hard to have a clear view of a commodity from thousands of related reviews. Further more, due to different characters between Chinese and English, Researching approaches may vary a lot. This paper aims to automatically mine out key information from commodity reviews. An effective algorithm, i.e. Chinese Commodity Review Miner (CCRM) is proposed. The algorithm can be divided into two parts. First, we propose an efficient rule based algorithm for commodity feature extraction as well as a probabilistic model for feature ranking. Second, we propose a top-to-down algorithm to reorganize the extracted features into hierarchical structure. A prototype system based on CCRM is also implemented. Using CCRM, users can easily acquire the outline of a commodity, and navigate freely in it.

[1]  Edoardo M. Airoldi,et al.  On Learning Parsimonious Models for Extracting Consumer Opinions , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[2]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[3]  Béatrice Daille,et al.  Study and Implementation of Combined Techniques for Automatic Extraction of Terminology , 1994 .

[4]  Taher H. Haveliwala Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search , 2003, IEEE Trans. Knowl. Data Eng..

[5]  Bing Liu,et al.  Opinion observer: analyzing and comparing opinions on the Web , 2005, WWW '05.

[6]  Slava M. Katz,et al.  Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[7]  Gábor Lugosi,et al.  Ranking and Scoring Using Empirical Risk Minimization , 2005, COLT.

[8]  Wei-Ying Ma,et al.  Learning to cluster web search results , 2004, SIGIR '04.

[9]  Christian Jacquemin,et al.  Term Extraction and Automatic Indexing , 2005 .

[10]  Graeme Hirst,et al.  Collocations as Cues to Semantic Orientation , 2004 .

[11]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[12]  Jianyong Wang,et al.  Improved relevance ranking in WebGather , 2008, Journal of Computer Science and Technology.

[13]  Ruslan Mitkov,et al.  The Oxford handbook of computational linguistics , 2003 .

[14]  Eric K. Ringger,et al.  Pulse: Mining Customer Opinions from Free Text , 2005, IDA.

[15]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[16]  Satoshi Morinaga,et al.  Mining product reputations on the Web , 2002, KDD.