Towards a comprehensive computational model foraesthetic assessment of videos

In this paper we propose a novel aesthetic model emphasizing psycho-visual statistics extracted from multiple levels in contrast to earlier approaches that rely only on descriptors suited for image recognition or based on photographic principles. At the lowest level, we determine dark-channel, sharpness and eye-sensitivity statistics over rectangular cells within a frame. At the next level, we extract Sentibank features (1,200 pre-trained visual classifiers) on a given frame, that invoke specific sentiments such as "colorful clouds", "smiling face" etc. and collect the classifier responses as frame-level statistics. At the topmost level, we extract trajectories from video shots. Using viewer's fixation priors, the trajectories are labeled as foreground, and background/camera on which statistics are computed. Additionally, spatio-temporal local binary patterns are computed that capture texture variations in a given shot. Classifiers are trained on individual feature representations independently. On thorough evaluation of 9 different types of features, we select the best features from each level -- dark channel, affect and camera motion statistics. Next, corresponding classifier scores are integrated in a sophisticated low-rank fusion framework to improve the final prediction scores. Our approach demonstrates strong correlation with human prediction on 1,000 broadcast quality videos released by NHK as an aesthetic evaluation dataset.

[1]  J. J. Vos Colorimetric and photometric properties of a 2° fundamental observer , 1978 .

[2]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[4]  Yan Ke,et al.  The Design of High-Level Features for Photo Quality Assessment , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Matti Pietikäinen,et al.  Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Xiaoou Tang,et al.  Photo and Video Quality Evaluation: Focusing on the Subject , 2008, ECCV.

[7]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[8]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[9]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Nuria Oliver,et al.  Towards Computational Models of the Visual Aesthetic Appeal of Consumer Videos , 2010, ECCV.

[11]  Gabriela Csurka,et al.  Assessing the aesthetic quality of photographs using generic image descriptors , 2011, 2011 International Conference on Computer Vision.

[12]  Mubarak Shah,et al.  A holistic approach to aesthetic enhancement of photographs , 2011, TOMCCAP.

[13]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[14]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[15]  Xiaoou Tang,et al.  Single Image Haze Removal Using Dark Channel Prior , 2011 .

[16]  Masashi Nishiyama,et al.  Aesthetic quality classification of photographs based on color harmony , 2011, CVPR 2011.

[17]  Naila Murray,et al.  AVA: A large-scale database for aesthetic visual analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Dong Liu,et al.  Robust late fusion with rank minimization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Damon M. Chandler,et al.  ${\bf S}_{3}$: A Spectral and Spatial Measure of Local Perceived Sharpness in Natural Images , 2012, IEEE Transactions on Image Processing.

[20]  Wei Luo,et al.  Content-Based Photo Quality Assessment , 2013, IEEE Trans. Multim..

[21]  W. Chu Studying Aesthetics in Photographic Images Using a Computational Approach , 2013 .

[22]  Rongrong Ji,et al.  Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.