Mining product reputations on the Web

Knowing the reputations of your own and/or competitors' products is important for marketing and customer relationship management. It is, however, very costly to collect and analyze survey data manually. This paper presents a new framework for mining product reputations on the Internet. It automatically collects people's opinions about target products from Web pages, and it uses text mining techniques to obtain the reputations of those products.On the basis of human-test samples, we generate in advance syntactic and linguistic rules to determine whether any given statement is an opinion or not, as well as whether such any opinion is positive or negative in nature. We first collect statements regarding target products using a general search engine, and then, using the rules, extract opinions from among them and attach three labels to each opinion, labels indicating the positive/negative determination, the product name itself, and an numerical value expressing the degree of system confidence that the statement is, in fact, an opinion. The labeled opinions are then input into an opinion database.The mining of reputations, i.e., the finding of statistically meaningful information included in the database, is then conducted. We specify target categories using label values (such as positive opinions of product A) and perform four types of text mining: extraction of 1) characteristic words, 2) co-occurrence words, 3) typical sentences, for individual target categories, and 4) correspondence analysis among multiple target categories.Actual marketing data is used to demonstrate the validity and effectiveness of the framework, which offers a drastic reduction in the overall cost of reputation analysis over that of conventional survey approaches and supports the discovery of knowledge from the pool of opinions on the web.

[1]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[2]  Kenji Yamanishi,et al.  A learning criterion for stochastic rules , 1990, COLT '90.

[3]  A. D. Gordon,et al.  Correspondence Analysis Handbook. , 1993 .

[4]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[5]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[6]  R. Agarwal Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[7]  Jorma Rissanen,et al.  Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.

[8]  Craig A. Knoblock,et al.  Wrapper generation for semi-structured Internet sources , 1997, SGMD.

[9]  Oren Etzioni,et al.  A scalable comparison-shopping agent for the World-Wide Web , 1997, AGENTS '97.

[10]  Boris Katz,et al.  From Sentence Processing to Information Access on the World Wide Web , 1997 .

[11]  Kenji Yamanishi,et al.  A Decision-Theoretic Extension of Stochastic Complexity and Its Applications to Learning , 1998, IEEE Trans. Inf. Theory.

[12]  Alberto O. Mendelzon,et al.  Database techniques for the World-Wide Web: a survey , 1998, SGMD.

[13]  Brad Adelberg,et al.  NoDoSE—a tool for semi-automatically extracting structured and semistructured data from text documents , 1998, SIGMOD '98.

[14]  Wei Li,et al.  Information Extraction Supported Question Answering , 1999, TREC.

[15]  R. Fikes,et al.  Question answering systems : papers from the 1999 AAAI Fall Symposium, November 5-7, North Falmouth, Massachusetts , 1999 .

[16]  Kenneth C. Litkowski Question-Answering Using Semantic Relation Triples , 1999, TREC.

[17]  Dragomir R. Radev,et al.  The Use of Predictive Annotation for Question Answering in TREC8 , 1999, TREC.

[18]  Tom M. Mitchell,et al.  Learning to construct knowledge bases from the World Wide Web , 2000, Artif. Intell..

[19]  Sanda M. Harabagiu,et al.  Experiments with Open-Domain Textual Question Answering , 2000, COLING.

[20]  David Clark,et al.  Shopbots Become Agents for Business Change , 2000, Computer.

[21]  Sanda M. Harabagiu,et al.  The Structure and Performance of an Open-Domain Question Answering System , 2000, ACL.

[22]  Ellen M. Voorhees,et al.  Building a question answering test collection , 2000, SIGIR '00.

[23]  Dragomir R. Radev,et al.  Question-answering by predictive annotation , 2000, SIGIR '00.

[24]  Hang Li,et al.  Mining from open answers in questionnaire data , 2001, KDD '01.

[25]  Hang Li,et al.  Mining Open Answers in Questionnaire Data , 2001, IEEE Intell. Syst..

[26]  Hang Li,et al.  Text classification using ESC-based stochastic decision lists , 2002, Inf. Process. Manag..

[27]  Kenji Yamanishi,et al.  A Learning Criterion for Stochastic Rules , 2004, Machine Learning.