论文信息 - A Multi-modal Multi-task based Approach for Movie Recommendation

A Multi-modal Multi-task based Approach for Movie Recommendation

An online recommendation system is one of the desires of digital e-commerce sectors and the OTT platforms like Amazon Prime, Netflix, SonyLiv, etc. In recent times, with an increase in the interaction of users with the different e-commerce platforms and then analyzing their liking-disliking essence, the recommendation system tries to predict the preference of the user for recommending new items that may capture his attention. In the current study, a multi-task-based architecture is designed to solve the multi-modal movie recommendation problem. Here our hypothesis is that solving two related tasks, namely (a) genre classification of movies and (b) rating identification for a user-movie pair, helps in generating good quality movie embeddings in an end-to-end setting without using a rating vector. For generating the representation of movies, unlike the state-of-the-art techniques, feature vectors extracted from multiple modalities like textual summary, audio and video information present in the movie trailers, and meta-data information are fused together. For representing the user, average representations of movies that are liked by the user are considered. Different multitasking models, fully shared (FS), shared-private (SP), and adversarial shared-private (ASP) feature models are developed for solving the above-mentioned two tasks simultaneously, genre classification, and user-movie rating prediction. For experimental purposes, MMTF-14K: a multifaceted movie trailer feature dataset was extended by incorporating textual features and meta-data information, and a multi-modal version of the MovieLens-100K dataset is used. Results of different multitasking models are shown in terms of RMSE and different rank-based metrics. The proposed multi-task model along with the adversarial training outperforms the state-of-the-art models when applied to the MMTF-14K and multi-modal version of MovieLens-100K datasets.

S. Saha | Prabir Mondal | Subham Raj | N. Onoe | Daipayan Chakder

[1] S. Saha,et al. Graph Convolutional Neural Network for Multimodal Movie Recommendation , 2023, SAC.

[2] S. Saha,et al. Graph Network based Approaches for Multi-modal Movie Recommendation System , 2022, IEEE International Conference on Systems, Man and Cybernetics.

[3] S. Saha,et al. Towards Developing a Multi-Modal Video Recommendation System , 2022, 2022 International Joint Conference on Neural Networks (IJCNN).

[4] Mohammad Hadi Valipour,et al. GHRS: Graph-based Hybrid Recommendation System with Application to Movie Recommendation , 2021, Expert Syst. Appl..

[5] Sriparna Saha,et al. Improving Depression Level Estimation by Concurrently Learning Emotion Intensity , 2020, IEEE Computational Intelligence Magazine.

[6] Sriparna Saha,et al. Towards Emotion-aided Multi-modal Dialogue Act Classification , 2020, ACL.

[7] Ashutosh Vyas,et al. Deep Learning for Natural Language Processing , 2019, Apress.

[8] Markus Schedl,et al. MMTF-14K: a multifaceted movie trailer feature dataset for recommendation and retrieval , 2018, MMSys.

[9] Amit P. Sheth,et al. Multi-Task Learning Framework for Mining Crowd Intelligence towards Clinical Treatment , 2018, NAACL.

[10] Jacob Russell Neterer,et al. Deep Learning in Natural Language Processing , 2018, Springer Singapore.

[11] Geraldo Zimbrão,et al. Autoencoders and recommender systems: COFILS approach , 2017, Expert Syst. Appl..

[12] Andreas Mavridis,et al. Matrix factorization techniques for recommender systems , 2017 .

[13] Xuanjing Huang,et al. Adversarial Multi-task Learning for Text Classification , 2017, ACL.

[14] Mirella Lapata,et al. Learning to Generate Product Reviews from Attributes , 2017, EACL.

[15] Lei Tian,et al. Multiple scales combined principle component analysis deep learning network for face recognition , 2016, J. Electronic Imaging.

[16] P. Salamon,et al. Can we measure beauty? Computational evaluation of coral reef aesthetics , 2015, PeerJ.

[17] Dacheng Tao,et al. Robust Face Recognition via Multimodal Deep Face Representation , 2015, IEEE Transactions on Multimedia.

[18] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[19] Vijay S. Pande,et al. Massively Multitask Networks for Drug Discovery , 2015, ArXiv.

[20] Zhen Wang,et al. Knowledge Graph Embedding by Translating on Hyperplanes , 2014, AAAI.

[21] Stathes Hadjiefthymiades,et al. Facing the cold start problem in recommender systems , 2014, Expert Syst. Appl..

[22] Roland R. Draxler,et al. Root mean square error (RMSE) or mean absolute error (MAE) , 2014 .

[23] Omer Levy,et al. word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[24] James R. Glass,et al. Unsupervised Methods for Speaker Diarization: An Integrated and Iterative Approach , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[25] Brian Kingsbury,et al. New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[27] Yehuda Koren,et al. Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[28] L. Deng. From Speech Recognition to Language and Multimodal Processing , 2015 .

[29] Peter Knees,et al. Automatic Music Tag Classification Based On Block-Level Features , 2010 .

[30] Jonathan L. Herlocker,et al. Evaluating collaborative filtering recommender systems , 2004, TOIS.

[31] Robin van Meteren. Using Content-Based Filtering for Recommendation , 2000 .