Adaptive Contrastive Learning on Multimodal Transformer for Review Helpfulness Prediction

Modern Review Helpfulness Prediction systems are dependent upon multiple modalities, typically texts and images. Unfortunately, those contemporary approaches pay scarce attention to polish representations of cross-modal relations and tend to suffer from inferior optimization. This might cause harm to model’s predictions in numerous cases. To overcome the aforementioned issues, we propose Multi-modal Contrastive Learning for Multimodal Review Helpfulness Prediction (MRHP) problem, concentrating on mutual information between input modalities to explicitly elaborate cross-modal relations. In addition, we introduce Adaptive Weighting scheme for our contrastive learning approach in order to increase flexibility in optimization. Lastly, we propose Multimodal Interaction module to address the unalignment nature of multimodal data, thereby assisting the model in producing more reasonable multimodal representations. Experimental results show that our method outperforms prior baselines and achieves state-of-the-art results on two publicly available benchmark datasets for MRHP problem.

[1]  Soujanya Poria,et al.  SANCL: Multimodal Review Helpfulness Prediction with Selective Attention and Natural Contrastive Learning , 2022, COLING.

[2]  A. Luu,et al.  Improving Neural Cross-Lingual Abstractive Summarization via Employing Optimal Transport Distance for Knowledge Distillation , 2022, AAAI.

[3]  Bing Liu,et al.  CLASSIC: Continual and Contrastive Learning of Aspect Sentiment Classification Tasks , 2021, EMNLP.

[4]  Anh Tuan Luu,et al.  Contrastive Learning for Neural Topic Model , 2021, NeurIPS.

[5]  Thomas Brox,et al.  CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Rui Wang,et al.  DialogueCSE: Dialogue-based Contrastive Learning of Sentence Embeddings , 2021, EMNLP.

[7]  Tho Quan,et al.  Enriching and Controlling Global Semantics for Text Summarization , 2021, EMNLP.

[8]  Lu Wang,et al.  CLIFF: Contrastive Learning for Improving Faithfulness and Factuality in Abstractive Summarization , 2021, EMNLP.

[9]  Andrew O. Arnold,et al.  Pairwise Supervised Contrastive Learning of Sentence Representations , 2021, EMNLP.

[10]  Chenyu You,et al.  Self-supervised Contrastive Cross-Modality Representation Learning for Spoken Question Answering , 2021, EMNLP.

[11]  Danqi Chen,et al.  SimCSE: Simple Contrastive Learning of Sentence Embeddings , 2021, EMNLP.

[12]  Mahmoud Al-Ayyoub,et al.  Predicting Helpfulness of Online Reviews , 2020, ArXiv.

[13]  Trung Le,et al.  Neural Topic Model via Optimal Transport , 2020, ICLR.

[14]  Zhixiong Zeng,et al.  Reasoning with Multimodal Sarcastic Tweets via Modeling Cross-Modality Contrast and Semantic Association , 2020, ACL.

[15]  Joel R. Tetreault,et al.  Multimodal Categorization of Crisis Events in Social Media , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Heeyoul Choi,et al.  Self-Knowledge Distillation in Natural Language Processing , 2019, RANLP.

[17]  Ruslan Salakhutdinov,et al.  Multimodal Transformer for Unaligned Multimodal Language Sequences , 2019, ACL.

[18]  Yue Wang,et al.  Topic-Aware Neural Keyphrase Generation for Social Media Language , 2019, ACL.

[19]  Miao Fan,et al.  Product-Aware Helpfulness Prediction of Online Reviews , 2019, WWW.

[20]  Jun Zhou,et al.  Cross-Domain Review Helpfulness Prediction Based on Convolutional Neural Networks with Auxiliary Domain Discriminators , 2018, NAACL.

[21]  Zhiyuan Liu,et al.  Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search , 2018, WSDM.

[22]  Lei Zhang,et al.  Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Hao Wang,et al.  Using Argument-based Features to Predict and Analyse Review Helpfulness , 2017, EMNLP.

[24]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[25]  Zhiguo Wang,et al.  Bilateral Multi-Perspective Matching for Natural Language Sentences , 2017, IJCAI.

[26]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[27]  Srikumar Krishnamoorthy,et al.  Linguistic features for review helpfulness prediction , 2015, Expert Syst. Appl..

[28]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[31]  Gilles Louppe,et al.  Understanding Random Forests: From Theory to Practice , 2014, 1407.7502.

[32]  Pearl Pu,et al.  Prediction of Helpful Reviews Using Emotions Extraction , 2014, AAAI.

[33]  Soo-Min Kim,et al.  Automatically Assessing Review Helpfulness , 2006, EMNLP.

[34]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[35]  Lidong Bing,et al.  Multi-perspective Coherent Reasoning for Helpfulness Prediction of Multimodal Reviews , 2021, ACL.

[36]  Nikhil Ketkar,et al.  Convolutional Neural Networks , 2021, Deep Learning with Python.

[37]  Danqi Chen,et al.  of the Association for Computational Linguistics: , 2001 .

[38]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.