Learning Spatiotemporal Features for Video Semantic Segmentation Using 3D Convolutional Neural Networks