论文信息 - SIRe-Networks: Skip Connections over Interlaced Multi-Task Learning and Residual Connections for Structure Preserving Object Classification

SIRe-Networks: Skip Connections over Interlaced Multi-Task Learning and Residual Connections for Structure Preserving Object Classification

Improving existing neural network architectures can involve several design choices such as manipulating the loss functions, employing a diverse learning strategy, exploiting gradient evolution at training time, optimizing the network hyper-parameters, or increasing the architecture depth. The latter approach is a straightforward solution, since it directly enhances the representation capabilities of a network; however, the increased depth generally incurs in the well-known vanishing gradient problem. In this paper, borrowing from different methods addressing this issue, we introduce an interlaced multi-task learning strategy, defined SIRe, to reduce the vanishing gradient in relation to the object classification task. The presented methodology directly improves a convolutional neural network (CNN) by enforcing the input image structure preservation through interlaced auto-encoders, and further refines the base network architecture by means of skip and residual connections. To validate the presented methodology, a simple CNN and various implementations of famous networks are extended via the SIRe strategy and extensively tested on the CIFAR100 dataset; where the SIRe-extended architectures achieve significantly increased performances across all models, thus confirming the presented approach effectiveness.

Luigi Cinque | Gian Luca Foresti | Danilo Avola | Alessio Fagioli

[1] J. Schnabel,et al. Deep Learning-Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation , 2019, IEEE Transactions on Medical Imaging.

[2] Xianping Fu,et al. Multi-Path Deep CNNs for Fine-Grained Car Recognition , 2020, IEEE Transactions on Vehicular Technology.

[3] Luigi Cinque,et al. Multimodal Feature Fusion and Knowledge-Driven Learning via Experts Consult for Thyroid Nodule Classification , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[4] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[5] Gady Agam,et al. Accelerated WGAN update strategy with loss change rate balancing , 2020 .

[6] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Jianhuang Lai,et al. Contour-Aware Loss: Boundary-Aware Learning for Salient Object Segmentation , 2020, IEEE Transactions on Image Processing.

[8] Hongdong Li,et al. Learning Joint Gait Representation via Quintuplet Loss Minimization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[10] Shuo Wang,et al. PAMTRI: Pose-Aware Multi-Task Learning for Vehicle Re-Identification Using Highly Randomized Synthetic Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11] Ruimao Zhang,et al. Exemplar Normalization for Learning Deep Representation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Bin Song,et al. Person Re-Identification with Feature Pyramid Optimization and Gradual Background Suppression , 2020, Neural Networks.

[13] Seong Joon Oh,et al. CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14] Zheng Li,et al. A multi-scale strategy for deep semantic segmentation with convolutional neural networks , 2019, Neurocomputing.

[15] Suha Kwak,et al. Proxy Anchor Loss for Deep Metric Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Xiaoou Tang,et al. Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net , 2018, ECCV.

[17] David Zhang,et al. Simultaneous Fidelity and Regularization Learning for Image Restoration , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Luigi Cinque,et al. Bodyprint—A Meta-Feature Based LSTM Hashing Model for Person Re-Identification , 2020, Sensors.

[20] Weihong Deng,et al. Adversarial Learning With Margin-Based Triplet Embedding Regularization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21] Wenming Tang,et al. Spectral Regularization for Combating Mode Collapse in GANs , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22] Bohyung Han,et al. Continual Learning by Asymmetric Loss Approximation With Single-Side Overestimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23] Mourad Gridach,et al. PyDiNet: Pyramid Dilated Network for medical image segmentation , 2021, Neural Networks.

[24] Yang Zhao,et al. Deep High-Resolution Representation Learning for Visual Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Aaron C. Courville,et al. Batch Weight for Domain Adaptation With Mass Shift , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26] Meihui Zhang,et al. Improving Data Analytics with Fast and Adaptive Regularization , 2021, IEEE Transactions on Knowledge and Data Engineering.

[27] Ruimao Zhang,et al. Switchable Normalization for Learning-to-Normalize Deep Representation , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[29] Qingmin Liao,et al. Classifier shared deep network with multi-hierarchy loss for low resolution face recognition , 2020, Signal Process. Image Commun..

[30] Luigi Cinque,et al. Deep Temporal Analysis for Non-Acted Body Affect Recognition , 2019, IEEE Transactions on Affective Computing.

[31] Zhen Lei,et al. An end-to-end exemplar association for unsupervised person Re-identification , 2020, Neural Networks.

[32] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Luigi Cinque,et al. Master and Rookie Networks for Person Re-identification , 2019, CAIP.

[34] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[35] Zhe L. Lin,et al. Structure-Guided Ranking Loss for Single Image Depth Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Song Wang,et al. A Multi-Task Mean Teacher for Semi-Supervised Shadow Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Qingming Huang,et al. Towards Discriminability and Diversity: Batch Nuclear-Norm Maximization Under Label Insufficient Situations , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Matthew R. Scott,et al. Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Rongrong Ji,et al. Pyramidal Person Re-IDentification via Multi-Loss Dynamic Training , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Wei Wu,et al. AM-LFS: AutoML for Loss Function Search , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[41] Shin Ishii,et al. An unsupervised EEG decoding system for human emotion recognition , 2019, Neural Networks.

[42] Yun Fu,et al. Residual Dense Network for Image Restoration , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43] Gustavo Carneiro,et al. One Shot Segmentation: Unifying Rigid Detection and Non-Rigid Segmentation Using Elastic Regularization , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44] Asifullah Khan,et al. A survey of the recent architectures of deep convolutional neural networks , 2019, Artificial Intelligence Review.

[45] Hujun Bao,et al. Prior Guided Dropout for Robust Visual Localization in Dynamic Environments , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[46] Rhee Man Kil,et al. Face Video Retrieval Based on the Deep CNN With RBF Loss , 2020, IEEE Transactions on Image Processing.