Efficient Confident Search in Large Review Corpora

Given an extensive corpus of reviews on an item, a potential customer goes through the expressed opinions and collects information, in order to form an educated opinion and, ultimately, make a purchase decision. This task is often hindered by false reviews, that fail to capture the true quality of the item's attributes. These reviews may be based on insufficient information or may even be fraudulent, submitted to manipulate the item's reputation. In this paper, we formalize the Confident Search paradigm for review corpora. We then present a complete search framework which, given a set of item attributes, is able to efficiently search through a large corpus and select a compact set of high-quality reviews that accurately captures the overall consensus of the reviewers on the specified attributes. We also introduce CREST (Confident REview Search Tool), a user-friendly implementation of our framework and a valuable tool for any person dealing with large review corpora. The efficacy of our framework is demonstrated through a rigorous experimental evaluation.

[1]  Bing Liu,et al.  Mining Opinion Features in Customer Reviews , 2004, AAAI.

[2]  Vasek Chvátal,et al.  A Greedy Heuristic for the Set-Covering Problem , 1979, Math. Oper. Res..

[3]  Siddharth Patwardhan,et al.  Feature Subsumption for Opinion Analysis , 2006, EMNLP.

[4]  Zhu Zhang,et al.  Utility scoring of product reviews , 2006, CIKM '06.

[5]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[6]  Matteo Fischetti,et al.  Algorithms for the Set Covering Problem , 2000, Ann. Oper. Res..

[7]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[8]  Bernhard Seeger,et al.  Progressive skyline computation in database systems , 2005, TODS.

[9]  Ming Zhou,et al.  Low-Quality Product Review Detection in Opinion Summarization , 2007, EMNLP.

[10]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[11]  Soo-Min Kim,et al.  Automatically Assessing Review Helpfulness , 2006, EMNLP.

[12]  Hsin-Hsi Chen,et al.  Opinion Extraction, Summarization and Tracking in News and Blog Corpora , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[13]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[14]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[15]  Rayid Ghani,et al.  Text mining for product attribute extraction , 2006, SKDD.

[16]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[17]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[18]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[19]  Oren Etzioni,et al.  Extracting Product Features and Opinions from Reviews , 2005, HLT.

[20]  Matteo Fischetti,et al.  A Heuristic Algorithm for the Set Covering Problem , 1996, IPCO.

[21]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[22]  Xiaoyan Zhu,et al.  Movie review mining and summarization , 2006, CIKM '06.

[23]  Panagiotis G. Ipeirotis,et al.  Show me the money!: deriving the pricing power of product features by mining consumer reviews , 2007, KDD '07.