Predicting movie ratings from audience behaviors

We propose a method of representing audience behavior through facial and body motions from a single video stream, and use these features to predict the rating for feature-length movies. This is a very challenging problem as: i) the movie viewing environment is dark and contains views of people at different scales and viewpoints; ii) the duration of feature-length movies is long (80-120 mins) so tracking people uninterrupted for this length of time is still an unsolved problem; and iii) expressions and motions of audience members are subtle, short and sparse making labeling of activities unreliable. To circumvent these issues, we use an infrared illuminated test-bed to obtain a visually uniform input. We then utilize motion-history features which capture the subtle movements of a person within a pre-defined volume, and then form a group representation of the audience by a histogram of pair-wise correlations over a small-window of time. Using this group representation, we learn our movie rating classifier from crowd-sourced ratings collected by rottentomatoes.com and show our prediction capability on audiences from 30 movies across 250 subjects (> 50 hrs).

[1]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[2]  James W. Davis,et al.  The Representation and Recognition of Action Using Temporal Templates , 1997, CVPR 1997.

[3]  F. Strack,et al.  Reports of subjective well-being: Judgmental processes and their methodological implications. , 1999 .

[4]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[5]  W. Murch In the blink of an eye : a perspective on film editing , 2001 .

[6]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7]  Alex Pentland,et al.  Social Network Computing , 2003, UbiComp.

[8]  Simon Baker,et al.  Lucas-Kanade 20 Years On: A Unifying Framework , 2004, International Journal of Computer Vision.

[9]  Alex Pentland,et al.  GroupMedia: distributed multi-modal interfaces , 2004, ICMI '04.

[10]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[11]  Elisabeth André,et al.  Emotion recognition based on physiological changes in music listening , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Erin SanGregory What is a topic report , 2008 .

[13]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[14]  Maja Pantic,et al.  Social signal processing: Survey of an emerging domain , 2009, Image Vis. Comput..

[15]  Gwen Littlewort,et al.  Toward Practical Smile Detection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Suh-Yin Lee,et al.  Emotion-based music recommendation by affinity discovery from film music , 2009, Expert Syst. Appl..

[17]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[20]  Michael J. Black,et al.  Secrets of optical flow estimation and their principles , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Nicu Sebe,et al.  Looking at the viewer: analysing facial activity to detect personal highlights of multimedia contents , 2010, Multimedia Tools and Applications.

[22]  Rosalind W. Picard,et al.  Acted vs. natural frustration and delight: Many people smile in natural frustration , 2011, Face and Gesture 2011.

[23]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[24]  Ivan Laptev,et al.  Data-driven crowd analysis in videos , 2011, ICCV.

[25]  Jeffrey F. Cohn,et al.  Painful data: The UNBC-McMaster shoulder pain expression archive database , 2011, Face and Gesture 2011.

[26]  Haibin Ling,et al.  Real time robust L1 tracker using accelerated proximal gradient approach , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  R. Pieters,et al.  Emotion-Induced Engagement in Internet Video Advertisements , 2012 .

[28]  Daniel McDuff,et al.  Crowdsourcing Facial Responses to Online Videos , 2012, IEEE Transactions on Affective Computing.

[29]  Daniel McDuff,et al.  Predicting online media effectiveness based on smile responses gathered over the Internet , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[30]  Sridha Sridharan,et al.  Fourier Lucas-Kanade Algorithm , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Geoff Hulten,et al.  Measuring the engagement level of TV viewers , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).