Task-Agnostic Vision Transformer for Distributed Learning of Image Processing

Recently, distributed learning approaches have been studied for using data from multiple sources without sharing them, but they are not usually suitable in applications where each client carries out different tasks. Meanwhile, Transformer has been widely explored in computer vision area due to its capability to learn the common representation through global attention. By leveraging the advantages of Transformer, here we present a new distributed learning framework for multiple image processing tasks, allowing clients to learn distinct tasks with their local data. This arises from a disentangled representation of local and non-local features using a task-specific head/tail and a task-agnostic Vision Transformer. Each client learns a translation from its own task to a common representation using the task-specific networks, while the Transformer body on the server learns global attention between the features embedded in the representation. To enable decomposition between the task-specific and common representations, we propose an alternating training strategy between clients and server. Experimental results on distributed learning for various tasks show that our method synergistically improves the performance of each client with its own data.

[1]  Jong-Chul Ye,et al.  Multi-Task Distributed Learning Using Vision Transformer With Random Patch Permutation , 2022, IEEE Transactions on Medical Imaging.

[2]  Lei Xing,et al.  TransCT: Dual-Path Transformer for Low Dose Computed Tomography , 2021, MICCAI.

[3]  Enhua Wu,et al.  Transformer in Transformer , 2021, NeurIPS.

[4]  Pichao Wang,et al.  TransReID: Transformer-based Object Re-Identification , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Ling Shao,et al.  Multi-Stage Progressive Image Restoration , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Wen Gao,et al.  Pre-Trained Image Processing Transformer , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Shuai Leng,et al.  Low Dose CT Image and Projection Dataset. , 2020, Medical physics.

[8]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[9]  Jianwei Li,et al.  Learning a Single Model With a Wide Range of Quality Factors for JPEG Image Artifacts Removal , 2020, IEEE Transactions on Image Processing.

[10]  Murali Annavaram,et al.  Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge , 2020, NeurIPS.

[11]  Daniel J. Beutel,et al.  Flower: A Friendly Federated Learning Research Framework , 2020, 2007.14390.

[12]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[13]  Seyit Camtepe,et al.  SplitFed: When Federated Learning Meets Split Learning , 2020, AAAI.

[14]  Chen Chen,et al.  Multi-Scale Progressive Fusion Network for Single Image Deraining , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Surya Nepal,et al.  Can We Use Split Learning on 1D CNN Models for Privacy Preserving Training? , 2020, AsiaCCS.

[16]  Zhihai Xu,et al.  Spatial-Adaptive Network for Single Image Denoising , 2020, ECCV.

[17]  Anit Kumar Sahu,et al.  Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[18]  Zhangyang Wang,et al.  DeblurGAN-v2: Deblurring (Orders-of-Magnitude) Faster and Better , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Vishal M. Patel,et al.  Uncertainty Guided Multi-Scale Residual Learning-Using a Cycle Spinning CNN for Single Image De-Raining , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Jechang Jeong,et al.  Densely Connected Hierarchical Network for Image Denoising , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[21]  Xiaodong Liu,et al.  Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.

[22]  Qinghua Hu,et al.  Progressive Image Deraining Networks: A Better and Simpler Baseline , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  W. Price,et al.  Privacy in the age of medical big data , 2019, Nature Medicine.

[24]  Yun Fu,et al.  Residual Dense Network for Image Restoration , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Anit Kumar Sahu,et al.  Federated Optimization in Heterogeneous Networks , 2018, MLSys.

[26]  Ramesh Raskar,et al.  Split learning for health: Distributed deep learning without sharing raw patient data , 2018, ArXiv.

[27]  Ramesh Raskar,et al.  Distributed learning of deep neural network over multiple agents , 2018, J. Netw. Comput. Appl..

[28]  Ying Wu,et al.  Semi-Supervised Transfer Learning for Image Rain Removal , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Rynson W. H. Lau,et al.  Dynamic Scene Deblurring Using Spatially Variant Recurrent Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Ramakanth Pasunuru,et al.  Soft Layer-Specific Multi-Task Summarization with Entailment and Question Generation , 2018, ACL.

[31]  Andrew J. Davison,et al.  End-To-End Multi-Task Learning With Attention , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Zhenguo Li,et al.  Federated Meta-Learning with Fast Convergence and Efficient Communication , 2018, 1802.07876.

[33]  Yi Wang,et al.  Scale-Recurrent Network for Deep Image Deblurring , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Thomas S. Huang,et al.  Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Jiri Matas,et al.  DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Lei Zhang,et al.  FFDNet: Toward a Fast and Flexible Solution for CNN-Based Image Denoising , 2017, IEEE Transactions on Image Processing.

[37]  Delu Zeng,et al.  Removing Rain from Single Images via a Deep Detail Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[39]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[40]  Wangmeng Zuo,et al.  Learning Deep CNN Denoiser Prior for Image Restoration , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Abhinav Gupta,et al.  A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Vishal M. Patel,et al.  Image De-Raining Using a Conditional Generative Adversarial Network , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[43]  Tae Hyun Kim,et al.  Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Shuicheng Yan,et al.  Deep Joint Rain Detection and Removal from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Xinghao Ding,et al.  Clearing the Skies: A Deep Network Architecture for Single-Image Rain Removal , 2016, IEEE Transactions on Image Processing.

[46]  Lei Zhang,et al.  Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising , 2016, IEEE Transactions on Image Processing.

[47]  Michael S. Brown,et al.  Rain Streak Removal Using Layer Priors , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Martial Hebert,et al.  Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[50]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[52]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[53]  Xiaoou Tang,et al.  Compression Artifacts Reduction by a Deep Convolutional Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[54]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[55]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[56]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[57]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[58]  Alessandro Foi,et al.  Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering , 2007, IEEE Transactions on Image Processing.

[59]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[60]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[61]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[62]  Jong-Chul Ye,et al.  Federated Split Task-Agnostic Vision Transformer for COVID-19 CXR Diagnosis , 2021, NeurIPS.

[63]  P. Luo,et al.  Segmenting Transparent Object in the Wild with Transformer , 2021 .

[64]  Ang Li,et al.  Task-Agnostic Privacy-Preserving Representation Learning via Federated Learning , 2020, Federated Learning.

[65]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[66]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[67]  Changhoon Yim,et al.  Quality Assessment of Deblocked Images , 2011, IEEE Transactions on Image Processing.