SIRe-Networks: Skip Connections over Interlaced Multi-Task Learning and Residual Connections for Structure Preserving Object Classification

Improving existing neural network architectures can involve several design choices such as manipulating the loss functions, employing a diverse learning strategy, exploiting gradient evolution at training time, optimizing the network hyper-parameters, or increasing the architecture depth. The latter approach is a straightforward solution, since it directly enhances the representation capabilities of a network; however, the increased depth generally incurs in the well-known vanishing gradient problem. In this paper, borrowing from different methods addressing this issue, we introduce an interlaced multi-task learning strategy, defined SIRe, to reduce the vanishing gradient in relation to the object classification task. The presented methodology directly improves a convolutional neural network (CNN) by enforcing the input image structure preservation through interlaced auto-encoders, and further refines the base network architecture by means of skip and residual connections. To validate the presented methodology, a simple CNN and various implementations of famous networks are extended via the SIRe strategy and extensively tested on the CIFAR100 dataset; where the SIRe-extended architectures achieve significantly increased performances across all models, thus confirming the presented approach effectiveness.

[1]  J. Schnabel,et al.  Deep Learning-Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation , 2019, IEEE Transactions on Medical Imaging.

[2]  Xianping Fu,et al.  Multi-Path Deep CNNs for Fine-Grained Car Recognition , 2020, IEEE Transactions on Vehicular Technology.

[3]  Luigi Cinque,et al.  Multimodal Feature Fusion and Knowledge-Driven Learning via Experts Consult for Thyroid Nodule Classification , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[5]  Gady Agam,et al.  Accelerated WGAN update strategy with loss change rate balancing , 2020 .

[6]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jianhuang Lai,et al.  Contour-Aware Loss: Boundary-Aware Learning for Salient Object Segmentation , 2020, IEEE Transactions on Image Processing.

[8]  Hongdong Li,et al.  Learning Joint Gait Representation via Quintuplet Loss Minimization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[10]  Shuo Wang,et al.  PAMTRI: Pose-Aware Multi-Task Learning for Vehicle Re-Identification Using Highly Randomized Synthetic Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Ruimao Zhang,et al.  Exemplar Normalization for Learning Deep Representation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Bin Song,et al.  Person Re-Identification with Feature Pyramid Optimization and Gradual Background Suppression , 2020, Neural Networks.

[13]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Zheng Li,et al.  A multi-scale strategy for deep semantic segmentation with convolutional neural networks , 2019, Neurocomputing.

[15]  Suha Kwak,et al.  Proxy Anchor Loss for Deep Metric Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Xiaoou Tang,et al.  Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net , 2018, ECCV.

[17]  David Zhang,et al.  Simultaneous Fidelity and Regularization Learning for Image Restoration , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Luigi Cinque,et al.  Bodyprint—A Meta-Feature Based LSTM Hashing Model for Person Re-Identification , 2020, Sensors.

[20]  Weihong Deng,et al.  Adversarial Learning With Margin-Based Triplet Embedding Regularization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Wenming Tang,et al.  Spectral Regularization for Combating Mode Collapse in GANs , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Bohyung Han,et al.  Continual Learning by Asymmetric Loss Approximation With Single-Side Overestimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Mourad Gridach,et al.  PyDiNet: Pyramid Dilated Network for medical image segmentation , 2021, Neural Networks.

[24]  Yang Zhao,et al.  Deep High-Resolution Representation Learning for Visual Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Aaron C. Courville,et al.  Batch Weight for Domain Adaptation With Mass Shift , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Meihui Zhang,et al.  Improving Data Analytics with Fast and Adaptive Regularization , 2021, IEEE Transactions on Knowledge and Data Engineering.

[27]  Ruimao Zhang,et al.  Switchable Normalization for Learning-to-Normalize Deep Representation , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[29]  Qingmin Liao,et al.  Classifier shared deep network with multi-hierarchy loss for low resolution face recognition , 2020, Signal Process. Image Commun..

[30]  Luigi Cinque,et al.  Deep Temporal Analysis for Non-Acted Body Affect Recognition , 2019, IEEE Transactions on Affective Computing.

[31]  Zhen Lei,et al.  An end-to-end exemplar association for unsupervised person Re-identification , 2020, Neural Networks.

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Luigi Cinque,et al.  Master and Rookie Networks for Person Re-identification , 2019, CAIP.

[34]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[35]  Zhe L. Lin,et al.  Structure-Guided Ranking Loss for Single Image Depth Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Song Wang,et al.  A Multi-Task Mean Teacher for Semi-Supervised Shadow Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Qingming Huang,et al.  Towards Discriminability and Diversity: Batch Nuclear-Norm Maximization Under Label Insufficient Situations , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Matthew R. Scott,et al.  Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Rongrong Ji,et al.  Pyramidal Person Re-IDentification via Multi-Loss Dynamic Training , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Wei Wu,et al.  AM-LFS: AutoML for Loss Function Search , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  Shin Ishii,et al.  An unsupervised EEG decoding system for human emotion recognition , 2019, Neural Networks.

[42]  Yun Fu,et al.  Residual Dense Network for Image Restoration , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Gustavo Carneiro,et al.  One Shot Segmentation: Unifying Rigid Detection and Non-Rigid Segmentation Using Elastic Regularization , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Asifullah Khan,et al.  A survey of the recent architectures of deep convolutional neural networks , 2019, Artificial Intelligence Review.

[45]  Hujun Bao,et al.  Prior Guided Dropout for Robust Visual Localization in Dynamic Environments , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[46]  Rhee Man Kil,et al.  Face Video Retrieval Based on the Deep CNN With RBF Loss , 2020, IEEE Transactions on Image Processing.