MS-Net: A lightweight separable ConvNet for multi-dimensional image processing

As the core technology of deep learning, convolutional neural networks have been widely applied in a variety of computer vision tasks and have achieved state-of-the-art performance. However, it’s difficult and inefficient for them to deal with high dimensional image signals due to the dramatic increase of training parameters. In this paper, we present a lightweight and efficient MS-Net for the multi-dimensional(MD) image processing, which provides a promising way to handle MD images, especially for devices with limited computational capacity. It takes advantage of a series of one dimensional convolution kernels and introduces a separable structure in the ConvNet throughout the learning process to handle MD image signals. Meanwhile, multiple group convolutions with kernel size 1 × 1 are used to extract channel information. Then the information of each dimension and channel is fused by a fusion module to extract the complete image features. Thus the proposed MS-Net significantly reduces the training complexity, parameters and memory cost. The proposed MS-Net is evaluated on both 2D and 3D benchmarks CIFAR-10, CIFAR-100 and KTH. Extensive experimental results show that the MS-Net achieves competitive performance with greatly reduced computational and memory cost compared with the state-of-the-art ConvNet models.

[1]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2008, International Journal of Computer Vision.

[2]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[4]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[5]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[6]  Recent Advances in Computer Vision - Theories and Applications , 2019, Recent Advances in Computer Vision.

[7]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.