Optimizing small channel 3D convolution on GPU with tensor core