ExpNet: A unified network for Expert-Level Classification

Different from the general visual classification, some classification tasks are more challenging as they need the professional categories of the images. In the paper, we call them expert-level classification. Previous fine-grained vision classification (FGVC) has made many efforts on some of its specific sub-tasks. However, they are difficult to expand to the general cases which rely on the comprehensive analysis of part-global correlation and the hierarchical features interaction. In this paper, we propose Expert Network (ExpNet) to address the unique challenges of expert-level classification through a unified network. In ExpNet, we hierarchically decouple the part and context features and individually process them using a novel attentive mechanism, called Gaze-Shift. In each stage, Gaze-Shift produces a focal-part feature for the subsequent abstraction and memorizes a context-related embedding. Then we fuse the final focal embedding with all memorized context-related embedding to make the prediction. Such an architecture realizes the dual-track processing of partial and global information and hierarchical feature interactions. We conduct the experiments over three representative expert-level classification tasks: FGVC, disease classification, and artwork attributes classification. In these experiments, superior performance of our ExpNet is observed comparing to the state-of-thearts in a wide range of fields, indicating the effectiveness and generalization of our ExpNet. The code will be made publicly available.

[1]  M. Pazzani,et al.  Detecting Glaucoma from Fundus Photographs Using Deep Learning without Convolutions: Transformer for Improved Generalization , 2022, Ophthalmology science.

[2]  Xiang Wan,et al.  Weakly Supervised Object Localization via Transformer with Implicit Spatial Calibration , 2022, ECCV.

[3]  Y. Tao,et al.  SatFormer: Saliency-Guided Abnormality-Aware Transformer for Retinal Disease Classification in Fundus Image , 2022, IJCAI.

[4]  Yehui Yang,et al.  SeATrans: Learning Segmentation-Assisted diagnosis model via Transformer , 2022, MICCAI.

[5]  Yi Shan,et al.  Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  In-Jae Yu,et al.  RecurSeed and EdgePredictMix: Pseudo-Label Refinement Learning for Weakly Supervised Semantic Segmentation across Single- and Multi-Stage Frameworks , 2022, 2204.06754.

[7]  Yunchao Wei,et al.  L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly Supervised Semantic Segmentation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Jungbeom Lee,et al.  Bridging the Gap between Classification and Localization for Weakly Supervised Object Localization , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Guodong Guo,et al.  Dynamic Group Transformer: A General Vision Transformer Backbone with Dynamic Group Attention , 2022, IJCAI.

[10]  Jian Yang,et al.  Dynamic MLP for Fine-Grained Image Classification by Leveraging Geographical and Temporal Information , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Haidar A. Almubarak,et al.  REFUGE2 Challenge: Treasure for Multi-Domain Learning in Glaucoma Assessment , 2022, ArXiv.

[12]  Tianzhu Zhang,et al.  Diverse Complementary Part Mining for Weakly Supervised Object Localization , 2022, IEEE Transactions on Image Processing.

[13]  Ying Tai,et al.  LCTR: On Awakening the Local Continuity of Transformer for Weakly Supervised Object Localization , 2021, AAAI.

[14]  Chao Xu,et al.  An Image Patch is a Wave: Phase-Aware Vision MLP , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Federico Tombari,et al.  Neural Fields in Visual Computing and Beyond , 2021, Comput. Graph. Forum.

[16]  Sheng Tian,et al.  ResGANet: Residual group attention network for medical image classification and segmentation , 2021, Medical Image Anal..

[17]  Joshua Ainslie,et al.  FNet: Mixing Tokens with Fourier Transforms , 2021, NAACL.

[18]  Alan Yuille,et al.  TransFG: A Transformer Architecture for Fine-grained Recognition , 2021, AAAI.

[19]  Ming-Ming Cheng,et al.  Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Khan Muhammad,et al.  COVID-Transformer: Interpretable COVID-19 Detection Using Vision Transformer for Healthcare , 2021, International journal of environmental research and public health.

[21]  Lei Zhang,et al.  A transformer-based framework for automatic COVID19 diagnosis in chest CTs , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[22]  Jingfeng Zhang,et al.  RAMS-Trans: Recurrent Attention Multi-scale Transformer for Fine-grained Image Recognition , 2021, ACM Multimedia.

[23]  Jiwen Lu,et al.  Global Filter Networks for Image Classification , 2021, NeurIPS.

[24]  Kerem Turgutlu,et al.  CLIP-Art: Contrastive Pre-training for Fine-Grained Art Classification , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[25]  Hajime Nagahara,et al.  GCNBoost: Artwork Classification by Label Propagation through a Knowledge Graph , 2021, ICMR.

[26]  N. Codella,et al.  CvT: Introducing Convolutions to Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Bolei Zhou,et al.  TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Quanfu Fan,et al.  CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Kai Shang,et al.  Multi-layer and multi-order fine-grained feature learning for artwork attribute recognition , 2021, Comput. Commun..

[30]  In-Jae Yu,et al.  Puzzle-CAM: Improved Localization Via Matching Partial And Full Features , 2021, 2021 IEEE International Conference on Image Processing (ICIP).

[31]  Matthieu Cord,et al.  Training data-efficient image transformers & distillation through attention , 2020, ICML.

[32]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[33]  Md. Farhad Hossain,et al.  Convid-Net: An Enhanced Convolutional Neural Network Framework for COVID-19 Detection from X-Ray Images , 2020, Advances in Intelligent Systems and Computing.

[34]  Yunchao Wei,et al.  Inter-Image Communication for Weakly Supervised Localization , 2020, ECCV.

[35]  Shuang Yu,et al.  Leveraging Undiagnosed Data for Glaucoma Classification with Teacher-Student Learning , 2020, MICCAI.

[36]  Gordon Wetzstein,et al.  Implicit Neural Representations with Periodic Activation Functions , 2020, NeurIPS.

[37]  Hao Li,et al.  Attribute Mix: Semantic Data Augmentation for Fine Grained Recognition , 2020, 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP).

[38]  Yi-Zhe Song,et al.  Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches , 2020, ECCV.

[39]  Yali Wang,et al.  Learning Attentive Pairwise Interaction for Fine-Grained Classification , 2020, AAAI.

[40]  Seong Joon Oh,et al.  Evaluating Weakly Supervised Object Localization Methods Right , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Wangmeng Zuo,et al.  Attention-guided CNN for image denoising , 2020, Neural Networks.

[42]  Longyin Wen,et al.  Attention Convolutional Binary Neural Tree for Fine-Grained Visual Categorization , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Sheng-hua Zhong,et al.  Fine-art painting classification via two-channel dual path networks , 2019, Int. J. Mach. Learn. Cybern..

[44]  Jufeng Yang,et al.  Recognizing the Style of Visual Arts via Adaptive Cross-layer Correlation , 2019, ACM Multimedia.

[45]  Dacheng Tao,et al.  Learning a Mixture of Granularity-Specific Experts for Fine-Grained Categorization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[46]  Yunchao Wei,et al.  Integral Object Mining via Online Attention Accumulation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[47]  Larry S. Davis,et al.  Cross-X Learning for Fine-Grained Visual Categorization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[48]  Andreas Dengel,et al.  Two-stage framework for optic disc localization and glaucoma classification in retinal fundus images using deep learning , 2019, BMC Medical Informatics and Decision Making.

[49]  Serge J. Belongie,et al.  The iMet Collection 2019 Challenge Dataset , 2019, ArXiv.

[50]  Tao Mei,et al.  Destruction and Construction Learning for Fine-Grained Image Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Qi Wu,et al.  Medical image classification using synergic deep learning , 2019, Medical Image Anal..

[52]  Suha Kwak,et al.  Weakly Supervised Learning of Instance Segmentation With Inter-Pixel Relations , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Margaret Lech,et al.  Two-Stage Deep Learning Approach to the Classification of Fine-Art Paintings , 2019, IEEE Access.

[54]  Yizhou Yu,et al.  Weakly Supervised Complementary Parts Models for Fine-Grained Image Classification From the Bottom Up , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[56]  Xiaochun Cao,et al.  Disc-Aware Ensemble Network for Glaucoma Screening From Fundus Image , 2018, IEEE Transactions on Medical Imaging.

[57]  Zhiqiang Tian,et al.  PSNet: prostate segmentation on MRI based on a convolutional neural network , 2018, Journal of medical imaging.

[58]  Florian Yger,et al.  Recognizing Art Style Automatically in Painting with Deep Learning , 2017, ACML.

[59]  James She,et al.  DeepArt: Learning Joint Representations of Visual Arts , 2017, ACM Multimedia.

[60]  Matthias Bethge,et al.  Comparing deep neural networks against humans: object recognition when the signal gets weaker , 2017, ArXiv.

[61]  Shuicheng Yan,et al.  A survey on deep learning-based fine-grained object classification and semantic segmentation , 2017, International Journal of Automation and Computing.

[62]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[63]  Tat-Seng Chua,et al.  SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Babak Saleh,et al.  Large-scale Classification of Fine-Art Paintings: Learning The Right Metric on The Right Feature , 2015, ArXiv.

[66]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[67]  Tony Lindeberg,et al.  A computational theory of visual receptive fields , 2013, Biological Cybernetics.

[68]  Subhransu Maji,et al.  Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[69]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .