L-CoIns: Language-based Colorization With Instance Awareness

in distinguishing instances corresponding to the same object words. In this paper, we pro-pose a transformer-based framework to automatically aggregate similar image patches and achieve instance awareness without any additional knowledge. By applying our presented luminance augmentation and counter-color loss to break down the statistical correlation between luminance and color words, our model is driven to synthesize colors with better descriptive consistency. We further collect a dataset to provide distinctive visual characteristics and detailed language descriptions for multiple instances in the

[1]  Menghan Xia,et al.  Disentangled Image Colorization via Global Anchors , 2022, ACM Trans. Graph..

[2]  Jing Liao,et al.  UniColor: A Unified Framework for Multi-Modal Colorization with Transformer , 2022, ACM Trans. Graph..

[3]  Olga Russakovsky,et al.  SiRi: A Simple Selective Retraining Mechanism for Transformer-based Visual Grounding , 2022, ECCV.

[4]  Sunghyun Cho,et al.  BigColor: Colorization using a Generative Color Prior for Natural Images , 2022, ECCV.

[5]  Boxin Shi,et al.  L-CoDe: Language-Based Colorization Using Color-Object Decoupled Conditions , 2022, AAAI.

[6]  Jiaya Jia,et al.  MAT: Mask-Aware Transformer for Large Hole Image Inpainting , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Shalini De Mello,et al.  GroupViT: Semantic Segmentation Emerges from Text Supervision , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Philip H. S. Torr,et al.  LAVT: Language-Aware Vision Transformer for Referring Image Segmentation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Chao Dong,et al.  Semantic-Sparse Colorization Network for Deep Exemplar-based Colorization , 2021, ECCV.

[10]  Jifeng Dai,et al.  FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Luc Van Gool,et al.  SwinIR: Image Restoration Using Swin Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[12]  Yu Li,et al.  Towards Vivid and Diverse Image Colorization with Generative Color Prior , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Xudong Jiang,et al.  Vision-Language Transformer and Query Generation for Referring Segmentation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Chang Zhou,et al.  CogView: Mastering Text-to-Image Generation via Transformers , 2021, NeurIPS.

[15]  Cordelia Schmid,et al.  Segmenter: Transformer for Semantic Segmentation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Yann LeCun,et al.  MDETR - Modulated Detection for End-to-End Multi-Modal Understanding , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Wengang Zhou,et al.  TransVG: End-to-End Visual Grounding with Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Alec Radford,et al.  Zero-Shot Text-to-Image Generation , 2021, ICML.

[19]  Nal Kalchbrenner,et al.  Colorization Transformer , 2021, ICLR.

[20]  Wonjae Kim,et al.  ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision , 2021, ICML.

[21]  Tao Xiang,et al.  Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Wen Gao,et al.  Pre-Trained Image Processing Transformer , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[24]  Bin Li,et al.  Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.

[25]  Yun Sheng,et al.  Stylization-Based Architecture for Fast Deep Exemplar Colorization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[27]  Hung-Kuo Chu,et al.  Instance-Aware Image Colorization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jordi Pont-Tuset,et al.  Connecting Vision and Language with Localized Narratives , 2019, ECCV.

[29]  Chengying Gao,et al.  Language-based colorization of scene sketches , 2019, ACM Trans. Graph..

[30]  C. Ballester,et al.  ChromaGAN: Adversarial Picture Colorization with Semantic Class Distribution , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[31]  Ullrich Köthe,et al.  Guided Image Generation with Conditional Invertible Neural Networks , 2019, ArXiv.

[32]  Ling Shao,et al.  Pixelated Semantic Colorization , 2019, International Journal of Computer Vision.

[33]  Yanping Xie,et al.  Language-Guided Image Colorization , 2018 .

[34]  Ling Shao,et al.  Pixel-level Semantics Guided Image Colorization , 2018, BMVC.

[35]  Dongdong Chen,et al.  Deep exemplar-based colorization , 2018, ACM Trans. Graph..

[36]  Larry S. Davis,et al.  Learning to Color from Language , 2018, NAACL.

[37]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Zhe Gan,et al.  AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Xiaodong Liu,et al.  Language-Based Image Editing with Recurrent Attentive Models , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[41]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[42]  Alexei A. Efros,et al.  Real-time user-guided image colorization with learned deep priors , 2017, ACM Trans. Graph..

[43]  Yong Yu,et al.  Unsupervised Diverse Colorization via Generative Adversarial Networks , 2017, ECML/PKDD.

[44]  Vittorio Ferrari,et al.  COCO-Stuff: Thing and Stuff Classes in Context , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Aditya Deshpande,et al.  Learning Diverse Image Colorization , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Fisher Yu,et al.  Scribbler: Controlling Deep Image Synthesis with Sketch and Color , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[48]  Gregory Shakhnarovich,et al.  Learning Representations for Automatic Colorization , 2016, ECCV.

[49]  Bin Sheng,et al.  Deep Colorization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[50]  Mohammed Ghanbari,et al.  Scope of validity of PSNR in image/video quality assessment , 2008 .

[51]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[52]  Boxin Shi,et al.  L-CoDer: Language-Based Colorization with Color-Object Decoupling Transformer , 2022, ECCV.

[53]  Jimeng Sun,et al.  CT2: Colorization Transformer via Color Tokens , 2022, ECCV.

[54]  Ying Tai,et al.  ColorFormer: Image Colorization via Color Memory Assisted Hybrid-Attention Transformer , 2022, ECCV.

[55]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[56]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Nash Equilibrium , 2017, ArXiv.

[57]  Edgar Simo-Serra,et al.  Let there be Color!: Joint End-to-end Learning of Global and Local Image Priors for Automatic Image Colorization with Simultaneous Classification , 2016 .