Using Mise-En-Scène Visual Features based on MPEG-7 and Deep Learning for Movie Recommendation

Item features play an important role in movie recommender systems, where recommendations can be generated by using explicit or implicit preferences of users on traditional features (attributes) such as tag, genre, and cast. Typically, movie features are human-generated, either editorially (e.g., genre and cast) or by leveraging the wisdom of the crowd (e.g., tag), and as such, they are prone to noise and are expensive to collect. Moreover, these features are often rare or absent for new items, making it difficult or even impossible to provide good quality recommendations. In this paper, we show that user's preferences on movies can be better described in terms of the mise-en-sc\`ene features, i.e., the visual aspects of a movie that characterize design, aesthetics and style (e.g., colors, textures). We use both MPEG-7 visual descriptors and Deep Learning hidden layers as example of mise-en-sc\`ene features that can visually describe movies. Interestingly, mise-en-sc\`ene features can be computed automatically from video files or even from trailers, offering more flexibility in handling new items, avoiding the need for costly and error-prone human-based tagging, and providing good scalability. We have conducted a set of experiments on a large catalogue of 4K movies. Results show that recommendations based on mise-en-sc\`ene features consistently provide the best performance with respect to richer sets of more traditional features, such as genre and tag.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[3]  Vittorio Loreto,et al.  Folksonomies, the semantic web, and movie recommendation , 2007 .

[4]  George Karypis,et al.  Sparse linear methods with side information for top-n recommendations , 2012, WWW.

[5]  Roberto Turrin,et al.  Performance of recommender algorithms on top-n recommendation tasks , 2010, RecSys '10.

[6]  Romit Roy Choudhury,et al.  Your reactions suggest you liked the movie: automatic content rating via reaction sensing , 2013, UbiComp.

[7]  Paolo Cremonesi,et al.  How to Combine Visual Features with Tags to Improve Movie Recommendation Accuracy? , 2016, EC-Web.

[8]  Zhoujun Li,et al.  Integrating rich information for video recommendation with multi-task rank aggregation , 2011, ACM Multimedia.

[9]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[10]  Mubarak Shah,et al.  Video categorization using semantics and semiotics , 2003 .

[11]  Diane J. Cook,et al.  Automatic Video Classification: A Survey of the Literature , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[12]  Rasoul Karimi,et al.  Active Learning for Recommender Systems , 2015, KI - Künstliche Intelligenz.

[13]  Franca Garzotto,et al.  Content-Based Video Recommendation System Based on Stylistic Visual Features , 2016, Journal on Data Semantics.

[14]  Martin Szomszor,et al.  Enriching Ontological User Profiles with Tagging History for Multi-Domain Recommendations , 2008 .

[15]  Tao Mei,et al.  Online video recommendation based on multimodal fusion and relevance feedback , 2007, CIVR '07.

[16]  Chunxiao Xing,et al.  Video Semantic Models : Survey and Evaluation* , 2006 .

[17]  Herbert Zettl,et al.  Essentials of Applied Media Aesthetics , 2002 .

[18]  Paolo Cremonesi,et al.  Toward Building a Content-Based Video Recommendation System Based on Low-Level Features , 2015, EC-Web.

[19]  Yashar Deldjoo,et al.  A low-cost infrared-optical head tracking solution for virtual 3D audio environment using the Nintendo Wii-remote , 2016, Entertain. Comput..

[20]  Xiangyang Wang,et al.  Content-based image retrieval by integrating color and texture features , 2012, Multimedia Tools and Applications.

[21]  Nick Bassiliades,et al.  E-Commerce and Web Technologies , 2013, Lecture Notes in Business Information Processing.

[22]  Warren Buckland,et al.  What Does the Statistical Style Analysis of Film Involve? A Review of Moving into Pictures. More on Film History, Style, and Analysis , 2007, Lit. Linguistic Comput..

[23]  Eduard H. Hovy,et al.  Recommendations without user preferences: a natural language processing approach , 2003, IUI '03.

[24]  Franca Garzotto,et al.  Recommending Movies Based on Mise-en-Scene Design , 2016, CHI Extended Abstracts.

[25]  James M. Rehg,et al.  Movie genre classification via scene categorization , 2010, ACM Multimedia.

[26]  Martha Larson,et al.  Collaborative Filtering beyond the User-Item Matrix , 2014, ACM Comput. Surv..

[27]  References , 1971 .

[28]  Svetha Venkatesh,et al.  Computational Media Aesthetics: Finding Meaning Beautiful , 2001, IEEE Multim..

[29]  Özgür Ulusoy,et al.  Bilvideo-7: an MPEG-7- compatible video indexing and retrieval system , 2010, IEEE MultiMedia.

[30]  B. S. Manjunath,et al.  Introduction to MPEG-7: Multimedia Content Description Interface , 2002 .

[31]  Pasquale Lops,et al.  Enhanced semantic TV-show representation for personalized electronic program guides , 2012, UMAP.

[32]  H. Zettl Sight, Sound, Motion: Applied Media Aesthetics , 1973 .

[33]  John Riedl,et al.  Tagsplanations: explaining recommendations using tags , 2009, IUI.

[34]  Yaser Sheikh,et al.  On the use of computable features for film classification , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[35]  Iryna Gurevych,et al.  Beyond the stars: exploiting free-text user reviews to improve the accuracy of movie recommendations , 2009, TSA@CIKM.

[36]  Julian J. McAuley,et al.  VBPR: Visual Bayesian Personalized Ranking from Implicit Feedback , 2015, AAAI.

[37]  Francesco Ricci,et al.  A survey of active learning in collaborative filtering recommender systems , 2016, Comput. Sci. Rev..

[38]  B. S. Manjunath,et al.  Color and texture descriptors , 2001, IEEE Trans. Circuits Syst. Video Technol..

[39]  Xavier Serra,et al.  Unifying Low-Level and High-Level Music Similarity Measures , 2011, IEEE Transactions on Multimedia.

[40]  Li Chen,et al.  Eye-Tracking Study of User Behavior in Recommender Interfaces , 2010, UMAP.

[41]  Li Li,et al.  A Survey on Visual Content-Based Video Indexing and Retrieval , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).