A Multi-modal Multi-task based Approach for Movie Recommendation

An online recommendation system is one of the desires of digital e-commerce sectors and the OTT platforms like Amazon Prime, Netflix, SonyLiv, etc. In recent times, with an increase in the interaction of users with the different e-commerce platforms and then analyzing their liking-disliking essence, the recommendation system tries to predict the preference of the user for recommending new items that may capture his attention. In the current study, a multi-task-based architecture is designed to solve the multi-modal movie recommendation problem. Here our hypothesis is that solving two related tasks, namely (a) genre classification of movies and (b) rating identification for a user-movie pair, helps in generating good quality movie embeddings in an end-to-end setting without using a rating vector. For generating the representation of movies, unlike the state-of-the-art techniques, feature vectors extracted from multiple modalities like textual summary, audio and video information present in the movie trailers, and meta-data information are fused together. For representing the user, average representations of movies that are liked by the user are considered. Different multitasking models, fully shared (FS), shared-private (SP), and adversarial shared-private (ASP) feature models are developed for solving the above-mentioned two tasks simultaneously, genre classification, and user-movie rating prediction. For experimental purposes, MMTF-14K: a multifaceted movie trailer feature dataset was extended by incorporating textual features and meta-data information, and a multi-modal version of the MovieLens-100K dataset is used. Results of different multitasking models are shown in terms of RMSE and different rank-based metrics. The proposed multi-task model along with the adversarial training outperforms the state-of-the-art models when applied to the MMTF-14K and multi-modal version of MovieLens-100K datasets.

[1]  S. Saha,et al.  Graph Convolutional Neural Network for Multimodal Movie Recommendation , 2023, SAC.

[2]  S. Saha,et al.  Graph Network based Approaches for Multi-modal Movie Recommendation System , 2022, IEEE International Conference on Systems, Man and Cybernetics.

[3]  S. Saha,et al.  Towards Developing a Multi-Modal Video Recommendation System , 2022, 2022 International Joint Conference on Neural Networks (IJCNN).

[4]  Mohammad Hadi Valipour,et al.  GHRS: Graph-based Hybrid Recommendation System with Application to Movie Recommendation , 2021, Expert Syst. Appl..

[5]  Sriparna Saha,et al.  Improving Depression Level Estimation by Concurrently Learning Emotion Intensity , 2020, IEEE Computational Intelligence Magazine.

[6]  Sriparna Saha,et al.  Towards Emotion-aided Multi-modal Dialogue Act Classification , 2020, ACL.

[7]  Ashutosh Vyas,et al.  Deep Learning for Natural Language Processing , 2019, Apress.

[8]  Markus Schedl,et al.  MMTF-14K: a multifaceted movie trailer feature dataset for recommendation and retrieval , 2018, MMSys.

[9]  Amit P. Sheth,et al.  Multi-Task Learning Framework for Mining Crowd Intelligence towards Clinical Treatment , 2018, NAACL.

[10]  Jacob Russell Neterer,et al.  Deep Learning in Natural Language Processing , 2018, Springer Singapore.

[11]  Geraldo Zimbrão,et al.  Autoencoders and recommender systems: COFILS approach , 2017, Expert Syst. Appl..

[12]  Andreas Mavridis,et al.  Matrix factorization techniques for recommender systems , 2017 .

[13]  Xuanjing Huang,et al.  Adversarial Multi-task Learning for Text Classification , 2017, ACL.

[14]  Mirella Lapata,et al.  Learning to Generate Product Reviews from Attributes , 2017, EACL.

[15]  Lei Tian,et al.  Multiple scales combined principle component analysis deep learning network for face recognition , 2016, J. Electronic Imaging.

[16]  P. Salamon,et al.  Can we measure beauty? Computational evaluation of coral reef aesthetics , 2015, PeerJ.

[17]  Dacheng Tao,et al.  Robust Face Recognition via Multimodal Deep Face Representation , 2015, IEEE Transactions on Multimedia.

[18]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[19]  Vijay S. Pande,et al.  Massively Multitask Networks for Drug Discovery , 2015, ArXiv.

[20]  Zhen Wang,et al.  Knowledge Graph Embedding by Translating on Hyperplanes , 2014, AAAI.

[21]  Stathes Hadjiefthymiades,et al.  Facing the cold start problem in recommender systems , 2014, Expert Syst. Appl..

[22]  Roland R. Draxler,et al.  Root mean square error (RMSE) or mean absolute error (MAE) , 2014 .

[23]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[24]  James R. Glass,et al.  Unsupervised Methods for Speaker Diarization: An Integrated and Iterative Approach , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Brian Kingsbury,et al.  New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[27]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[28]  L. Deng From Speech Recognition to Language and Multimodal Processing , 2015 .

[29]  Peter Knees,et al.  Automatic Music Tag Classification Based On Block-Level Features , 2010 .

[30]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[31]  Robin van Meteren Using Content-Based Filtering for Recommendation , 2000 .