Media Quality Assessment by Perceptual Gaze-Shift Patterns Discovery

Quality assessment is an indispensable technique in a large body of media applications, i.e., photo retargeting, scenery rendering, and video summarization. In this paper, a fully automatic framework is proposed to mimic how humans subjectively perceive media quality. The key is a locality-preserved sparse encoding algorithm that accurately discovers human gaze shifting paths from each image or video clip. In particular, we first extract local image descriptors from each image/video, and subsequently project them into the so-called perceptual space. Then, a nonnegative matrix factorization (NMF) algorithm is proposed that represents each graphlet by a linear and sparse combination of the basis ones. Since each graphlet is visually/semantically similar to its neighbors, a locality-preserved constraint is encoded into the NMF algorithm. Mathematically, the saliency of each graphlet is quantified by the norm of its sparse codes. Afterward, we sequentially link them into a path to simulate human gaze allocation. Finally, a probabilistic quality model is learned based on such paths extracted from a collection of photos/videos, which are marked as high quality ones via multiple Flickr users. Comprehensive experiments have demonstrated that: 1) our quality model outperforms many of its competitors significantly, and 2) the learned paths are on average 89.5% consistent with real human gaze shifting paths.

[1]  Dacheng Tao,et al.  BIT: Biologically Inspired Tracker , 2016, IEEE Transactions on Image Processing.

[2]  Xiao Liu,et al.  Semi-supervised Node Splitting for Random Forest Construction , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Vicente Ordonez,et al.  High level describable attributes for predicting aesthetics and interestingness , 2011, CVPR 2011.

[4]  Moshe Eizenman,et al.  General theory of remote gaze estimation using the pupil center and corneal reflections , 2006, IEEE Transactions on Biomedical Engineering.

[5]  Xuelong Li,et al.  Large-Scale Aerial Image Categorization Using a Multitask Topological Codebook , 2016, IEEE Transactions on Cybernetics.

[6]  Xuelong Li,et al.  Fusion of Multichannel Local and Global Structural Cues for Photo Aesthetics Evaluation , 2014, IEEE Transactions on Image Processing.

[7]  Ali Borji,et al.  Probabilistic learning of task-specific visual attention , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Yan Ke,et al.  The Design of High-Level Features for Photo Quality Assessment , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Jean-Marc Odobez,et al.  Gaze estimation from multimodal Kinect data , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[10]  Yue Gao,et al.  Feature Correlation Hypergraph: Exploiting High-order Potentials for Multimodal Recognition , 2014, IEEE Transactions on Cybernetics.

[11]  Takahiro Okabe,et al.  A Head Pose-free Approach for Appearance-based Gaze Estimation , 2011, BMVC.

[12]  Alan C. Bovik Handbook of Video Databases: Design and Applications , 2003 .

[13]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Ling Shao,et al.  Perceptually Guided Photo Retargeting , 2017, IEEE Transactions on Cybernetics.

[15]  Xuelong Li,et al.  Detecting Densely Distributed Graph Patterns for Fine-Grained Image Categorization , 2016, IEEE Transactions on Image Processing.

[16]  W. Chu Studying Aesthetics in Photographic Images Using a Computational Approach , 2013 .

[17]  Yue Gao,et al.  Representative Discovery of Structure Cues for Weakly-Supervised Image Segmentation , 2014, IEEE Transactions on Multimedia.

[18]  Wei Luo,et al.  Content-Based Photo Quality Assessment , 2013, IEEE Trans. Multim..

[19]  Xuelong Li,et al.  Semantic Photo Retargeting Under Noisy Image Labels , 2016, TOMM.

[20]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[21]  Yi Yang,et al.  Weakly Supervised Human Fixations Prediction , 2016, IEEE Transactions on Cybernetics.

[22]  Ming-Hsuan Yang,et al.  Top-down visual saliency via joint CRF and dictionary learning , 2012, CVPR.

[23]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Jiebo Luo,et al.  Probabilistic Exposure Fusion , 2012, IEEE Transactions on Image Processing.

[25]  Nicu Sebe,et al.  Collaborative Sparse Coding for Multiview Action Recognition , 2016, IEEE MultiMedia.

[26]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[27]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[28]  Benjamin Z. Yao,et al.  Introduction to a Large-Scale General Purpose Ground Truth Database: Methodology, Annotation Tool and Benchmarks , 2007, EMMCVPR.

[29]  Qiang Ji,et al.  Probabilistic gaze estimation without active personal calibration , 2011, CVPR 2011.

[30]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[31]  Ali Borji,et al.  Computational Modeling of Top-down Visual Attention in Interactive Environments , 2011, BMVC.

[32]  Naila Murray,et al.  AVA: A large-scale database for aesthetic visual analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Wen Gao,et al.  Probabilistic Multi-Task Learning for Visual Saliency Estimation in Video , 2010, International Journal of Computer Vision.

[34]  Xuelong Li,et al.  Image Categorization by Learning a Propagated Graphlet Path , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[36]  Dacheng Tao,et al.  Subspaces Indexing Model on Grassmann Manifold for Image Search , 2011, IEEE Transactions on Image Processing.

[37]  Yi Yang,et al.  A Probabilistic Associative Model for Segmenting Weakly Supervised Images , 2014, IEEE Transactions on Image Processing.

[38]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[39]  Nuria Oliver,et al.  Towards Computational Models of the Visual Aesthetic Appeal of Consumer Videos , 2010, ECCV.

[40]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[41]  Meng Wang,et al.  Biologically Inspired Media Quality Modeling , 2015, ACM Multimedia.

[42]  Harish Katti,et al.  An Eye Fixation Database for Saliency Detection in Images , 2010, ECCV.

[43]  Atsushi Nakazawa,et al.  Point of Gaze Estimation through Corneal Surface Reflection in an Active Illumination Environment , 2012, ECCV.

[44]  Xiao Liu,et al.  Probabilistic Graphlet Transfer for Photo Cropping , 2013, IEEE Transactions on Image Processing.

[45]  Ming-Sui Lee,et al.  Video Aesthetic Quality Assessment by Temporal Integration of Photo- and Motion-Based Features , 2013, IEEE Transactions on Multimedia.

[46]  Yi Yang,et al.  Discovering Discriminative Graphlets for Aerial Image Categories Recognition , 2013, IEEE Transactions on Image Processing.

[47]  Masashi Nishiyama,et al.  Aesthetic quality classification of photographs based on color harmony , 2011, CVPR 2011.

[48]  Li Xu,et al.  Hierarchical Saliency Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Yoichi Sato,et al.  Sensation-based photo cropping , 2009, ACM Multimedia.

[50]  Shuo Wang,et al.  Predicting human gaze beyond pixels. , 2014, Journal of vision.

[51]  Michael Lindenbaum,et al.  Esaliency (Extended Saliency): Meaningful Attention Using Stochastic Image Modeling , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Xiaoou Tang,et al.  Photo and Video Quality Evaluation: Focusing on the Subject , 2008, ECCV.

[53]  Chun Chen,et al.  Feature selection for fast speech emotion recognition , 2009, ACM Multimedia.

[54]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[55]  Bingbing Ni,et al.  Learning to photograph , 2010, ACM Multimedia.

[56]  John K. Tsotsos,et al.  Saliency, attention, and visual search: an information theoretic approach. , 2009, Journal of vision.

[57]  Laurent Itti,et al.  Integrating human context and occlusion reasoning to improve handheld object tracking , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[58]  Feiping Nie,et al.  Embedding new data points for manifold learning via coordinate propagation , 2007, Knowledge and Information Systems.

[59]  Lihi Zelnik-Manor,et al.  Context-Aware Saliency Detection , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[60]  Yongdong Zhang,et al.  Multiview Spectral Embedding , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[61]  Gabriela Csurka,et al.  Assessing the aesthetic quality of photographs using generic image descriptors , 2011, 2011 International Conference on Computer Vision.

[62]  Jean-Marc Odobez,et al.  Person independent 3D gaze estimation from remote RGB-D cameras , 2013, 2013 IEEE International Conference on Image Processing.

[63]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[64]  Xuelong Li,et al.  Actively Learning Human Gaze Shifting Paths for Semantics-Aware Photo Cropping , 2014, IEEE Transactions on Image Processing.

[65]  Xiao Liu,et al.  Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[66]  Martin D. Levine,et al.  Visual Saliency Based on Scale-Space Analysis in the Frequency Domain , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[67]  Markus A. Stricker,et al.  Similarity of color images , 1995, Electronic Imaging.

[68]  Christof Koch,et al.  Image Signature: Highlighting Sparse Salient Regions , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.