Learning a Discriminative Model for the Perception of Realism in Composite Images

What makes an image appear realistic? In this work, we are answering this question from a data-driven perspective by learning the perception of visual realism directly from large amounts of data. In particular, we train a Convolutional Neural Network (CNN) model that distinguishes natural photographs from automatically generated composite images. The model learns to predict visual realism of a scene in terms of color, lighting and texture compatibility, without any human annotations pertaining to it. Our model outperforms previous works that rely on hand-crafted heuristics, for the task of classifying realistic vs. unrealistic photos. Furthermore, we apply our learned model to compute optimal parameters of a compositing method, to maximize the visual realism score predicted by our CNN model. We demonstrate its advantage against existing methods via a human perception study.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[3]  Sebastian Nowozin,et al.  Discriminative Non-blind Deblurring , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Jean-François Lalonde,et al.  The Perception of Lighting Inconsistencies in Composite Outdoor Scenes , 2015, ACM Trans. Appl. Percept..

[5]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[6]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[7]  Adam Finkelstein,et al.  A no-reference metric for evaluating the quality of motion deblurring , 2013, ACM Trans. Graph..

[8]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[9]  Hany Farid,et al.  Exposing digital forgeries by detecting traces of resampling , 2005, IEEE Transactions on Signal Processing.

[10]  Julie Dorsey,et al.  Understanding and improving the realism of image composites , 2012, ACM Trans. Graph..

[11]  Erik Reinhard,et al.  Real-time color blending of rendered and captured video , 2004 .

[12]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Vladlen Koltun,et al.  Geodesic Object Proposals , 2014, ECCV.

[14]  Martin J. Wainwright,et al.  Image denoising using scale mixtures of Gaussians in the wavelet domain , 2003, IEEE Trans. Image Process..

[15]  Alexei A. Efros,et al.  Photo clip art , 2007, ACM Trans. Graph..

[16]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2007, SIGGRAPH 2007.

[17]  Erik Reinhard,et al.  Color Transfer between Images , 2001, IEEE Computer Graphics and Applications.

[18]  Edward H. Adelson,et al.  A multiresolution spline with application to image mosaics , 1983, TOGS.

[19]  Yair Weiss,et al.  From learning models of natural image patches to whole image restoration , 2011, 2011 International Conference on Computer Vision.

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[22]  Maya R. Gupta,et al.  How to Analyze Paired Comparison Data , 2011 .

[23]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[24]  Yacov Hel-Or,et al.  A Discriminative Approach for Wavelet Denoising , 2008, IEEE Transactions on Image Processing.

[25]  Alexei A. Efros,et al.  Using Color Compatibility for Assessing Image Realism , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[26]  Jorge Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[27]  Eli Shechtman,et al.  Image melding , 2012, ACM Trans. Graph..

[28]  Sylvain Paris,et al.  Error-Tolerant Image Compositing , 2010, International Journal of Computer Vision.

[29]  Wojciech Matusik,et al.  CG2Real: Improving the Realism of Computer Generated Images Using a Large Collection of Photographs , 2011, IEEE Transactions on Visualization and Computer Graphics.

[30]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[31]  Patrick Pérez,et al.  Poisson image editing , 2003, ACM Trans. Graph..

[32]  James M. Rehg,et al.  A data-driven approach to quantifying natural human motion , 2005, SIGGRAPH '05.

[33]  Karl F. MacDorman,et al.  The Uncanny Valley [From the Field] , 2012, IEEE Robotics Autom. Mag..

[34]  James F. O'Brien,et al.  Exposing Photo Manipulation from Shading and Shadows , 2014, ACM Trans. Graph..

[35]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.