Exemplar-Based Image and Video Stylization Using Fully Convolutional Semantic Features

Color and tone stylization in images and videos strives to enhance unique themes with artistic color and tone adjustments. It has a broad range of applications from professional image postprocessing to photo sharing over social networks. Mainstream photo enhancement softwares, such as Adobe Lightroom and Instagram, provide users with predefined styles, which are often hand-crafted through a trial-and-error process. Such photo adjustment tools lack a semantic understanding of image contents and the resulting global color transform limits the range of artistic styles it can represent. On the other hand, stylistic enhancement needs to apply distinct adjustments to various semantic regions. Such an ability enables a broader range of visual styles. In this paper, we first propose a novel deep learning architecture for exemplar-based image stylization, which learns local enhancement styles from image pairs. Our deep learning architecture consists of fully convolutional networks for automatic semantics-aware feature extraction and fully connected neural layers for adjustment prediction. Image stylization can be efficiently accomplished with a single forward pass through our deep network. To extend our deep network from image stylization to video stylization, we exploit temporal superpixels to facilitate the transfer of artistic styles from image exemplars to videos. Experiments on a number of data sets for image stylization as well as a diverse set of video clips demonstrate the effectiveness of our deep learning architecture.

[1]  Michael F. Cohen,et al.  Digital photography with flash and no-flash image pairs , 2004, ACM Trans. Graph..

[2]  Wojciech Matusik,et al.  Image restoration using online photo collections , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[4]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[6]  Yunjin Lee,et al.  Art‐photographic detail enhancement , 2014, Comput. Graph. Forum.

[7]  Bolei Zhou,et al.  Places: An Image Database for Deep Scene Understanding , 2016, ArXiv.

[8]  Pierre Hellier,et al.  Saliency-Guided Consistent Color Harmonization , 2013, CCIW.

[9]  Edward H. Adelson,et al.  Personal photo enhancement using example images , 2010, TOGS.

[10]  Fabio Pellacini,et al.  AppProp: all-pairs appearance-space edit propagation , 2008, ACM Trans. Graph..

[11]  Sylvain Paris,et al.  Learning photographic global tonal adjustment with a database of input / output image pairs , 2011, CVPR 2011.

[12]  Zeev Farbman,et al.  Interactive local adjustment of tonal values , 2006, ACM Trans. Graph..

[13]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Yizhou Yu,et al.  Combining the Best of Convolutional Layers and Recurrent Layers: A Hybrid Network for Semantic Segmentation , 2016, ArXiv.

[15]  Yizhou Yu,et al.  Deep Contrast Learning for Salient Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Zeev Farbman,et al.  Edge-preserving decompositions for multi-scale tone and detail manipulation , 2008, ACM Trans. Graph..

[17]  Sylvain Paris,et al.  Example-based video color grading , 2013, ACM Trans. Graph..

[18]  Dani Lischinski,et al.  Personalization of image enhancement , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  John W. Fisher,et al.  A Video Representation Using Temporal Superpixels , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[21]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[22]  Chun Chen,et al.  Data-driven image color theme enhancement , 2010, SIGGRAPH 2010.

[23]  Yizhou Yu,et al.  An L1 image transform for edge-preserving smoothing and scene-level intrinsic decomposition , 2015, ACM Trans. Graph..

[24]  In-So Kweon,et al.  Automatic Content-Aware Color and Tone Stylization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Mario Fritz,et al.  Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Svetlana Lazebnik,et al.  Superparsing , 2010, International Journal of Computer Vision.

[27]  Frédo Durand,et al.  Two-scale tone management for photographic look , 2006, ACM Trans. Graph..

[28]  Jan Kautz,et al.  Local Laplacian filters , 2015, Commun. ACM.

[29]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Seungyong Lee,et al.  Bilateral texture filtering , 2014, ACM Trans. Graph..

[31]  Yizhou Yu,et al.  Example-based image color and tone style enhancement , 2011, SIGGRAPH 2011.

[32]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Julie Dorsey,et al.  Learning and Applying Color Styles From Feature Films , 2013, Comput. Graph. Forum.

[34]  Yizhou Yu,et al.  Example-based image color and tone style enhancement , 2011, ACM Trans. Graph..

[35]  智一 吉田,et al.  Efficient Graph-Based Image Segmentationを用いた圃場図自動作成手法の検討 , 2014 .

[36]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[37]  Yizhou Yu,et al.  Automatic Photo Adjustment Using Deep Neural Networks , 2014, ACM Trans. Graph..

[38]  Thomas Brox,et al.  Artistic Style Transfer for Videos , 2016, GCPR.

[39]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[40]  Hans-Peter Seidel,et al.  Interactive by-example design of artistic packing layouts , 2013, ACM Trans. Graph..

[41]  Stephen Lin,et al.  A Learning-to-Rank Approach for Image Color Enhancement , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[43]  Daniel Cohen-Or,et al.  Color harmonization , 2006, ACM Trans. Graph..

[44]  Ming Yang,et al.  Regionlets for Generic Object Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[45]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Frédo Durand,et al.  Data-driven hallucination of different times of day from a single outdoor photo , 2013, ACM Trans. Graph..

[47]  Frédo Durand,et al.  Edge-preserving multiscale image decomposition based on local extrema , 2009, ACM Trans. Graph..

[48]  Chenliang Xu,et al.  Evaluation of super-voxel methods for early video processing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.