LDS-Inspired Residual Networks

Residual networks (ResNets) have introduced a milestone for the deep learning community due to their outstanding performance in diverse applications. They enable efficient training of increasingly deep networks, reducing the training difficulty and error. The main intuition behind them is that, instead of mapping the input information, they are mapping a residual part of it. Since the original work, a lot of extensions have been proposed to improve information mapping. In this paper, a novel extension of the residual block is proposed inspired by linear dynamical systems (LDSs), called LDS-ResNet. Specifically, a new module is presented that improves mapping of residual information by transforming it in a hidden state and then mapping it back to the desired feature space using convolutional layers. The proposed module is utilized to construct multi-branch residual blocks for convolutional neural networks. An exploration of possible architectural choices is presented and evaluated. Experimental results show that LDS-ResNet outperforms the original ResNet in image classification and object detection tasks on public datasets such as CIFAR-10/100, ImageNet, VOC, and MOT2017. Moreover, its performance boost is complementary to other extensions of the original network such as pre-activation and bottleneck, as well as stochastic training and Squeeze-Excitation.

[1]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[2]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[3]  Nuno Vasconcelos,et al.  Classifying Video with Kernel Dynamic Textures , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[6]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[7]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[8]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[9]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Geoffrey Zweig,et al.  The microsoft 2016 conversational speech recognition system , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[12]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[13]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[14]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[15]  Stephen J. Maybank,et al.  Learning Human Actions by Combining Global Dynamics and Local Appearance , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Diogo Almeida,et al.  Resnet in Resnet: Generalizing Residual Architectures , 2016, ArXiv.

[17]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[18]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[20]  René Vidal,et al.  Categorizing Dynamic Textures Using a Bag of Dynamical Systems , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Nikos Grammalidis,et al.  Grading of invasive breast carcinoma through Grassmannian VLAD encoding , 2017, PloS one.

[22]  Nikolaos Grammalidis,et al.  Higher Order Linear Dynamical Systems for Smoke Detection in Video Surveillance Applications , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[23]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Nikos Grammalidis,et al.  Classification of Multidimensional Time-Evolving Data Using Histograms of Grassmannian Points , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  Stefan Roth,et al.  MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.

[26]  Yuping Wang,et al.  Non-Linear Dynamic Texture Analysis and Synthesis Using Constrained Gaussian Process Latent Variable Model , 2009, 2009 Pacific-Asia Conference on Circuits, Communications and Systems.

[27]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.