Providing consumers with a representative subset from online reviews

Purpose The purpose of this paper is to find a representative subset from large-scale online reviews for consumers. The subset is significantly small in size, but covers the majority amount of information in the original reviews and contains little redundant information. Design/methodology/approach A heuristic approach named RewSel is proposed to successively select representatives until the number of representatives meets the requirement. To reveal the advantages of the approach, extensive data experiments and a user study are conducted on real data. Findings The proposed approach has the advantage over the benchmarks in terms of coverage and redundancy. People show preference to the representative subsets provided by RewSel. The proposed approach also has good scalability, and is more adaptive to big data applications. Research limitations/implications The paper contributes to the literature of review selection, by proposing a heuristic approach which achieves both high coverage and low redundancy. This study can be applied as the basis for conducting further analysis of large-scale online reviews. Practical implications The proposed approach offers a novel way to select a representative subset of online reviews to facilitate consumer decision making. It can also enhance the existing information retrieval system to provide representative information to users rather than a large amount of results. Originality/value The proposed approach finds the representative subset by adopting the concept of relative entropy and sentiment analysis methods. Compared with state-of-the-art approaches, it offers a more effective and efficient way for users to handle a large amount of online information.

[1]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[2]  Dimitrios Gunopulos,et al.  Efficient Confident Search in Large Review Corpora , 2010, ECML/PKDD.

[3]  Do-Hyung Park,et al.  eWOM overload and its effect on consumer behavioral intention depending on consumer involvement , 2008, Electron. Commer. Res. Appl..

[4]  David Bawden,et al.  The dark side of information: overload, anxiety and other paradoxes and pathologies , 2009, J. Inf. Sci..

[5]  Panayiotis Tsaparas,et al.  Using micro-reviews to select an efficient set of reviews , 2013, CIKM.

[6]  Nadjla Hariri Relevance ranking on Google , 2011 .

[7]  Panayiotis Tsaparas,et al.  Review Selection Using Micro-Reviews , 2015, IEEE Transactions on Knowledge and Data Engineering.

[8]  Mark Crovella,et al.  Selecting a characteristic set of reviews , 2012, KDD.

[9]  Burairah Hussin,et al.  Opinion Mining of Movie Review using Hybrid Method of Support Vector Machine and Particle Swarm Optimization , 2013 .

[10]  Philip S. Yu,et al.  A holistic lexicon-based approach to opinion mining , 2008, WSDM '08.

[11]  Wenjing Duan,et al.  An Analysis of One-Star Online Reviews and Responses in the Washington, D.C., Lodging Market , 2013 .

[12]  Chetashri Bhadane,et al.  Sentiment Analysis: Measuring Opinions , 2015 .

[13]  Emilio Paolucci,et al.  Are customers' reviews creating value in the hospitality industry? Exploring the moderating effects of market positioning , 2016, Int. J. Inf. Manag..

[14]  Gautam Das,et al.  The TagAdvisor: Luring the Lurkers to Review Web Items , 2015, SIGMOD Conference.

[15]  F. Okumus,et al.  Understanding Satisfied and Dissatisfied Hotel Customers: Text Mining of Online Hotel Reviews , 2016 .

[16]  Yolanda Polo-Redondo,et al.  The influence of EWOM characteristics on online repurchase intention: Mediating roles of trust and perceived usefulness , 2016, Online Inf. Rev..

[17]  Xinying Xu,et al.  Hidden sentiment association in chinese web opinion mining , 2008, WWW.

[18]  Ming-Yi Chen,et al.  Can two-sided messages increase the helpfulness of online reviews? , 2016, Online Inf. Rev..

[19]  Ching-Chiang Yeh,et al.  Online word-of-mouth as a predictor of television rating , 2015, Online Inf. Rev..

[20]  Jin Zhang,et al.  Extracting Representative Information to Enhance Flexible Data Queries , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Jin Zhang,et al.  A heuristic approach for λ-representative information retrieval from large-scale data , 2014, Inf. Sci..

[22]  Pushkin Kachroo,et al.  A feedback control approach to maintain consumer information load in online shopping environments , 2011, Inf. Manag..

[23]  Anthony K. H. Tung,et al.  Finding representative set from massive data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[24]  Yi-Fen Chen,et al.  Herd behavior in purchasing books online , 2008, Comput. Hum. Behav..

[25]  Panagiotis G. Ipeirotis,et al.  Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics , 2010, IEEE Transactions on Knowledge and Data Engineering.

[26]  Danny Tengti Kao,et al.  The effect of incomplete information on the compromise effect , 2012, Judgment and Decision Making.

[27]  Jin Zhang,et al.  Finding a representative subset from large-scale documents , 2016, J. Informetrics.

[28]  Evimaria Terzi,et al.  Selecting a comprehensive set of reviews , 2011, KDD.

[29]  Elisabetta Fersini,et al.  Sentiment analysis: Bayesian Ensemble Learning , 2014, Decis. Support Syst..

[30]  Daniel Zeng,et al.  Fine-grained opinion mining by integrating multiple review sources , 2010 .

[31]  Zhan Bu,et al.  Discovering shilling groups in a real e-commerce platform , 2016, Online Inf. Rev..

[32]  Guoqing Chen,et al.  A combined measure for representative information retrieval in enterprise information systems , 2011, J. Enterp. Inf. Manag..

[33]  Theofanis Sapatinas The Elements of Statistical Learning , 2004 .

[34]  Thorsten Joachims,et al.  Eye-tracking analysis of user behavior in WWW search , 2004, SIGIR '04.

[35]  Hongyan Liu,et al.  CRO: a system for online review structurization , 2008, KDD.

[36]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[37]  Sung-Byung Yang,et al.  The role of online product reviews on information adoption of new product development professionals , 2015, Internet Res..

[38]  Cheng-Chieh Hsiao,et al.  Whose online reviews have the most influences on consumers in cultural offerings? Professional vs consumer commentators , 2014, Internet Res..

[39]  Yue Lu,et al.  Exploiting social context for review quality prediction , 2010, WWW '10.

[40]  Yue Pan,et al.  Born Unequal: A Study of the Helpfulness of User-Generated Product Reviews , 2011 .

[41]  Yanquan Zhou,et al.  Mining customer requirements from online reviews: A product improvement perspective , 2016, Inf. Manag..

[42]  Antti Oulasvirta,et al.  When more is less: the paradox of choice in search engine use , 2009, SIGIR.

[43]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.