Selecting a characteristic set of reviews

Online reviews provide consumers with valuable information that guides their decisions on a variety of fronts: from entertainment and shopping to medical services. Although the proliferation of online reviews gives insights about different aspects of a product, it can also prove a serious drawback: consumers cannot and will not read thousands of reviews before making a purchase decision. This need to extract useful information from large review corpora has spawned considerable prior work, but so far all have drawbacks. Review summarization (generating statistical descriptions of review sets) sacrifices the immediacy and narrative structure of reviews. Likewise, review selection (identifying a subset of 'helpful' or 'important' reviews) leads to redundant or non-representative summaries. In this paper, we fill the gap between existing review-summarization and review-selection methods by selecting a small subset of reviews that together preserve the statistical properties of the entire review corpus. We formalize this task as a combinatorial optimization problem and show that it NP-hard both tosolve and approximate. We also design effective algorithms that prove to work well in practice. Our experiments with real review corpora on different types of products demonstrate the utility of our methods, and our user studies indicate that our methods provide a better summary than prior approaches.

[1]  Yue Lu Exploiting Social Context for Review Quality Prediction , 2010 .

[2]  Dimitrios Gunopulos,et al.  Efficient Confident Search in Large Review Corpora , 2010, ECML/PKDD.

[3]  Zhifeng Zhang,et al.  Adaptive time-frequency decompositions , 1994 .

[4]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[5]  Xiaoyan Zhu,et al.  Movie review mining and summarization , 2006, CIKM '06.

[6]  Han Tong Loh,et al.  Gather customer concerns from online product reviews - A text summarization approach , 2009, Expert Syst. Appl..

[7]  Houfeng Wang,et al.  Mining User Reviews: from Specification to Summarization , 2009, ACL/IJCNLP.

[8]  Xiaohui Yu,et al.  Modeling and Predicting the Helpfulness of Online Reviews , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[9]  Ari Rappoport,et al.  RevRank: A Fully Unsupervised Algorithm for Selecting the Most Helpful Book Reviews , 2009, ICWSM.

[10]  S. Mallat A wavelet tour of signal processing , 1998 .

[11]  Stphane Mallat,et al.  A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way , 2008 .

[12]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[13]  Andrea Esuli,et al.  Multi-Faceted Rating of Product Reviews , 2009, ERCIM News.

[14]  Panagiotis G. Ipeirotis,et al.  Designing novel review ranking systems: predicting the usefulness and impact of reviews , 2007, ICEC.

[15]  Philip S. Yu,et al.  A holistic lexicon-based approach to opinion mining , 2008, WSDM '08.

[16]  Kazutaka Shimada,et al.  Multi-aspects review summarization with objective information , 2011 .

[17]  Zhu Zhang,et al.  Utility scoring of product reviews , 2006, CIKM '06.

[18]  Evimaria Terzi,et al.  Selecting a comprehensive set of reviews , 2011, KDD.

[19]  Ming Zhou,et al.  Low-Quality Product Review Detection in Opinion Summarization , 2007, EMNLP.

[20]  Soo-Min Kim,et al.  Automatically Assessing Review Helpfulness , 2006, EMNLP.

[21]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .