In order to detect traffic congestion from the current surveillance system, fast and reliable detection are required using in practice. In the consideration of the transmission capability, high frame rate video is hard to use in the real situation. In this paper, ultra-low frame rate image are extracted from the surveillance video camera. From those images, semantic segmentation are used to label out vehicle, lane, road isolation belt and other commonly seen items. Atrous convolution are used and show its effectiveness to extract feature of various items. After successful detection vehicle and lane, we proposed a method to identify whether it congested or not. This method fused deep learning-based approach and feature extraction of vehicle spacing to robustly identify the road state. The detection is fast and reliable and could satisfy the practical applications. Automatic identification of the road congestion could further prompt the development of the intelligent transport system (ITS).