An Estimator for Rating Video Contents on the Basis of a Viewer's Behavior in Typical Home Environments

A novel method for predicting the ratings of video content on the basis of a viewer's behavior in a typical home environment is proposed. Using an input signal provided by a Kinect sensor, it identifies the presence of a viewer by extracting key point trajectories in video sequences of that viewer. It then estimates whether the viewer is gazing at the video content or not on the basis of the viewer's head pose, which is estimated in two different ways: by a color image-based module and by a depth-image-based module. Results from the two modules are combined to increase robustness. To evaluate the proposed method, a simulation test on TV viewing in a typical living space was conducted. The simulation results suggest that the proposed method can robustly detect the viewer's gaze and predict his or her rating of the video content.

[1]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[2]  Alexander H. Waibel,et al.  Modeling focus of attention for meeting indexing based on multiple cues , 2002, IEEE Trans. Neural Networks.

[3]  Cristina Conati,et al.  Empirically building and evaluating a probabilistic model of user affect , 2009, User Modeling and User-Adapted Interaction.

[4]  Björn W. Schuller,et al.  Emotion representation, analysis and synthesis in continuous space: A survey , 2011, Face and Gesture 2011.

[5]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[6]  Yunde Jia,et al.  Audio-visual emotion recognition using Boltzmann Zippers , 2012, ICIP.

[7]  Shin'ichi Satoh,et al.  [Paper] Using Trajectory Features to Recognize Human Actions Within Crowd Sequences of Real Surveillance Video , 2013 .

[8]  Rainer Lienhart,et al.  An extended set of Haar-like features for rapid object detection , 2002, Proceedings. International Conference on Image Processing.

[9]  Miwa Nakanishi,et al.  User Characteristic-Based Information-Providing Service for Museum with Optical See-Through Head-Mounted Display: Does It Evoke Enthusiasm? , 2011, HCI.

[10]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Norbert Krüger,et al.  Face Recognition by Elastic Bunch Graph Matching , 1997, CAIP.

[12]  Alexander G. Hauptmann,et al.  MoSIFT: Recognizing Human Actions in Surveillance Videos , 2009 .

[13]  Zhengyou Zhang,et al.  3D Deformable Face Tracking with a Commodity Depth Camera , 2010, ECCV.

[14]  Keiichi Kitajo,et al.  Synchronization of spontaneous eyeblinks while viewing video stories , 2009, Proceedings of the Royal Society B: Biological Sciences.

[15]  Abdel-Hamid Soliman,et al.  Eye-Gesture Analysis for Driver Hazard Awareness , 2012 .

[16]  Neal Leavitt,et al.  Recommendation technology: will it boost e-commerce? , 2006, Computer.

[17]  A. Felfernig,et al.  A Short Survey of Recommendation Technologies in Travel and Tourism , 2006 .

[18]  Takatsugu Hirayama,et al.  Computational Models of Human Visual Attention and Their Implementations: A Survey , 2013, IEICE Trans. Inf. Syst..

[19]  M. Posner,et al.  Orienting of Attention* , 1980, The Quarterly journal of experimental psychology.

[20]  Mahito Fujii,et al.  Video Face Tracking and Recognition with Skin Region Extraction and Deformable Template Matching , 2012, Int. J. Multim. Data Eng. Manag..

[21]  Rafael A. Calvo,et al.  Affect Detection: An Interdisciplinary Review of Models, Methods, and Their Applications , 2010, IEEE Transactions on Affective Computing.

[22]  Simon Dobri Emotion Recognition using Linear Transformations in Combination with Video , 2009 .

[23]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[24]  Antonis A. Argyros,et al.  Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.