As the core technology of deep learning, convolutional neural networks have been widely applied in a variety of computer vision tasks and have achieved state-of-the-art performance. However, it’s difficult and inefficient for them to deal with high dimensional image signals due to the dramatic increase of training parameters. In this paper, we present a lightweight and efficient MS-Net for the multi-dimensional(MD) image processing, which provides a promising way to handle MD images, especially for devices with limited computational capacity. It takes advantage of a series of one dimensional convolution kernels and introduces a separable structure in the ConvNet throughout the learning process to handle MD image signals. Meanwhile, multiple group convolutions with kernel size 1 × 1 are used to extract channel information. Then the information of each dimension and channel is fused by a fusion module to extract the complete image features. Thus the proposed MS-Net significantly reduces the training complexity, parameters and memory cost. The proposed MS-Net is evaluated on both 2D and 3D benchmarks CIFAR-10, CIFAR-100 and KTH. Extensive experimental results show that the MS-Net achieves competitive performance with greatly reduced computational and memory cost compared with the state-of-the-art ConvNet models.
[1]
Juan Carlos Niebles,et al.
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words
,
2008,
International Journal of Computer Vision.
[2]
Enhua Wu,et al.
Squeeze-and-Excitation Networks
,
2017,
IEEE Transactions on Pattern Analysis and Machine Intelligence.
[3]
Wei Liu,et al.
SSD: Single Shot MultiBox Detector
,
2015,
ECCV.
[4]
Martin A. Fischler,et al.
The Representation and Matching of Pictorial Structures
,
1973,
IEEE Transactions on Computers.
[5]
Ali Farhadi,et al.
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
,
2016,
ECCV.
[6]
Recent Advances in Computer Vision - Theories and Applications
,
2019,
Recent Advances in Computer Vision.
[7]
Ming Yang,et al.
3D Convolutional Neural Networks for Human Action Recognition
,
2010,
IEEE Transactions on Pattern Analysis and Machine Intelligence.