Negative Confidence-Aware Weakly Supervised Binary Classification for Effective Review Helpfulness Classification

The incompleteness of positive labels and the presence of many unlabelled instances are common problems in binary classification applications such as in review helpfulness classification. Various studies from the classification literature consider all unlabelled instances as negative examples. However, a classification model that learns to classify binary instances with incomplete positive labels while assuming all unlabelled data to be negative examples will often generate a biased classifier. In this work, we propose a novel Negative Confidence-aware Weakly Supervised approach (NCWS), which customises a binary classification loss function by discriminating the unlabelled examples with different negative confidences during the classifier's training. NCWS allows to effectively, unbiasedly identify and separate positive and negative instances after its integration into various binary classifiers from the literature, including SVM, CNN and BERT-based classifiers. We use the review helpfulness classification as a test case for examining the effectiveness of our NCWS approach. We thoroughly evaluate NCWS by using three different datasets, namely one from Yelp (venue reviews), and two from Amazon (Kindle and Electronics reviews). Our results show that NCWS outperforms strong baselines from the literature including an existing SVM-based approach (i.e. SVM-P), the positive and unlabelled learning-based approach (i.e. C-PU) and the positive confidence-based approach (i.e. P-conf) in addressing the classifier's bias problem. Moreover, we further examine the effectiveness of NCWS by using its classified helpful reviews in a state-of-the-art review-based venue recommendation model (i.e. DeepCoNN) and demonstrate the benefits of using NCWS in enhancing venue recommendation effectiveness in comparison to the baselines.

[1]  Srikumar Krishnamoorthy,et al.  Linguistic features for review helpfulness prediction , 2015, Expert Syst. Appl..

[2]  Julian J. McAuley,et al.  Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering , 2016, WWW.

[3]  Gilles Blanchard,et al.  Semi-Supervised Novelty Detection , 2010, J. Mach. Learn. Res..

[4]  Xiaohui Yu,et al.  Modeling and Predicting the Helpfulness of Online Reviews , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[5]  Masashi Sugiyama,et al.  Clustering Unclustered Data: Unsupervised Binary Labeling of Two Datasets Having Different Class Balances , 2013, 2013 Conference on Technologies and Applications of Artificial Intelligence.

[6]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[7]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[8]  Yanqing Zhang,et al.  SVMs Modeling for Highly Imbalanced Classification , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[9]  Yiqun Liu,et al.  Neural Attentional Rating Regression with Review-level Explanations , 2018, WWW.

[10]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[11]  Ming Zhou,et al.  Low-Quality Product Review Detection in Opinion Summarization , 2007, EMNLP.

[12]  Alexander Zien,et al.  Semi-Supervised Learning in Practice , 2006 .

[13]  Jun Zhou,et al.  Cross-Domain Review Helpfulness Prediction Based on Convolutional Neural Networks with Auxiliary Domain Discriminators , 2018, NAACL.

[14]  Andrew K. C. Wong,et al.  Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..

[15]  Shehroz S. Khan,et al.  A Survey of Recent Trends in One Class Classification , 2009, AICS.

[16]  Gang Niu,et al.  Binary Classification from Positive-Confidence Data , 2017, NeurIPS.

[17]  Yoon-Joo Park,et al.  Predicting the Helpfulness of Online Customer Reviews across Different Product Types , 2018 .

[18]  Inderjit S. Dhillon,et al.  Generalized Nonnegative Matrix Approximations with Bregman Divergences , 2005, NIPS.

[19]  Zhi-Hua Zhou,et al.  A brief introduction to weakly supervised learning , 2018 .

[20]  Ari Rappoport,et al.  RevRank: A Fully Unsupervised Algorithm for Selecting the Most Helpful Book Reviews , 2009, ICWSM.

[21]  Gilles Blanchard,et al.  Novelty detection: Unlabeled data definitely help , 2009, AISTATS.

[22]  Fabio Crestani,et al.  Adversarial Training for Review-Based Recommendations , 2019, SIGIR.

[23]  Gang Niu,et al.  Learning from Complementary Labels , 2017, NIPS.

[24]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[25]  Forrest Sheng Bao,et al.  Semantic Analysis and Helpfulness Prediction of Text for Online Product Reviews , 2015, ACL.

[26]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[27]  Alexander J. Smola,et al.  Estimating labels from label proportions , 2008, ICML '08.

[28]  Bart De Moor,et al.  Assessing binary classifiers using only positive and unlabeled data , 2015, ArXiv.

[29]  Martha White,et al.  Recovering True Classifier Performance in Positive-Unlabeled Learning , 2017, AAAI.

[30]  Soo-Min Kim,et al.  Automatically Assessing Review Helpfulness , 2006, EMNLP.

[31]  HuYa-Han,et al.  Predicting hotel review helpfulness , 2016 .

[32]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[33]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[34]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[35]  Vincent Ng,et al.  Modeling and Prediction of Online Product Review Helpfulness: A Survey , 2018, ACL.

[36]  Jun Wang,et al.  What makes a helpful online review? A meta-analysis of review characteristics , 2018, Electronic Commerce Research.

[37]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[38]  Pei-Ju Lee,et al.  Assessing the helpfulness of online hotel reviews: A classification-based approach , 2018, Telematics Informatics.

[39]  Gang Niu,et al.  Classification from Pairwise Similarity and Unlabeled Data , 2018, ICML.

[40]  Gang Niu,et al.  Convex Formulation for Learning from Positive and Unlabeled Data , 2015, ICML.

[41]  Yue Lu,et al.  Exploiting social context for review quality prediction , 2010, WWW '10.

[42]  Gang Niu,et al.  Positive-Unlabeled Learning with Non-Negative Risk Estimator , 2017, NIPS.

[43]  Ayyaz Hussain,et al.  An analysis of review content and reviewer variables that contribute to review helpfulness , 2018, Inf. Process. Manag..

[44]  David C. Yen,et al.  A study of factors that contribute to online review helpfulness , 2015, Comput. Hum. Behav..

[45]  Kuanchin Chen,et al.  Predicting hotel review helpfulness: The impact of review visibility, and interaction between hotel stars and review ratings , 2016, Int. J. Inf. Manag..

[46]  Chunxia Zhang,et al.  Identifying Helpful Online Reviews with Word Embedding Features , 2016, KSEM.

[47]  Lei Zheng,et al.  Joint Deep Modeling of Users and Items Using Reviews for Recommendation , 2017, WSDM.

[48]  Craig MacDonald,et al.  Comparison of Sentiment Analysis and User Ratings in Venue Recommendation , 2019, ECIR.