Real‐time helpfulness prediction based on voter opinions

This paper studies the problem of designing real‐time helpfulness prediction algorithms. Instead of following the conventional route, in which the fraction of positive votes is used as the measure of helpfulness, we give ‘helpfulness’ a naturally sensible and mathematically precise definition, namely, as the probability that a user will vote ‘helpful’ on the user‐generated content. Building on this definition, we introduce a principled methodology to helpfulness prediction, in which the prediction problem is naturally formulated as an optimization problem. Under this proposed methodology, we first develop a batch (off‐line) algorithm. Experiments on data from Amazon.com suggest that our proposed model in fact outperforms the previously reported prediction algorithm, support vector regression. In some circumstances, an online algorithm that can update the model as additional data arrive is required. In light of this, we proposed an online algorithm that incrementally updates the parameters of the model. Finally, an efficient hybrid algorithm is provided to increase the convergence rate and prediction precision. The final two algorithms are tested on real‐life user‐generated contents, and experimental results illustrate that the hybrid approach efficiently processes incoming data and generates reliable helpfulness predictions for users. Copyright © 2011 John Wiley & Sons, Ltd.

[1]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[2]  Michael I. Jordan Why the logistic function? A tutorial discussion on probabilities and neural networks , 1995 .

[3]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[4]  Richong Zhang,et al.  An information gain-based approach for recommending useful product reviews , 2011, Knowledge and Information Systems.

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  Max Mühlhäuser,et al.  Automatically Assessing the Post Quality in Online Discussions on Software , 2007, ACL.

[7]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[8]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[9]  A. McCallum,et al.  Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[10]  Kathleen R. McKeown,et al.  Predicting the semantic orientation of adjectives , 1997 .

[11]  Ingoo Han,et al.  The Effect of On-Line Consumer Reviews on Consumer Purchasing Intention: The Moderating Role of Involvement , 2007, Int. J. Electron. Commer..

[12]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[13]  Jiawei Han,et al.  Modeling hidden topics on document manifold , 2008, CIKM '08.

[14]  Iryna Gurevych,et al.  Predicting the perceived quality of web forum posts , 2007 .

[15]  K. Perez Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment , 2014 .

[16]  Yue Lu,et al.  Rated aspect summarization of short comments , 2009, WWW '09.

[17]  Xiaoyan Zhu,et al.  Movie review mining and summarization , 2006, CIKM '06.

[18]  Yasushi Sakurai,et al.  Online multiscale dynamic topic models , 2010, KDD.

[19]  Zhu Zhang,et al.  Utility scoring of product reviews , 2006, CIKM '06.

[20]  Richong Zhang,et al.  Review recommendation with graphical model and EM algorithm , 2010, WWW '10.

[21]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[22]  Geneva G. Belford,et al.  Multi-aspect expertise matching for review assignment , 2008, CIKM '08.

[23]  Soo-Min Kim,et al.  Automatically Assessing Review Helpfulness , 2006, EMNLP.

[24]  D. Mackay,et al.  Bayesian neural networks and density networks , 1995 .

[25]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[26]  Xiaohui Yu,et al.  Modeling and Predicting the Helpfulness of Online Reviews , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[27]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[28]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[29]  Daniel Barbará,et al.  On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking , 2008, 2008 Eighth IEEE International Conference on Data Mining.