Traffic Scene Classification on a Representation Budget

Visual cues can be used alongside GPS positioning and digital maps to improve understanding of vehicle environment in fleet management systems. Such systems are limited both in terms of bandwidth and storage space, so minimizing the size of transmitted and stored visual data is a priority. In this paper, we present efficient strategies for computing very short image representations suitable for classifying various types of traffic scenes in fleet management systems. We anticipate that the set of interesting classes will change over time, so we consider image representations that can be trained without knowing the labels of the target dataset. We empirically evaluate and compare the presented methods on a contributed dataset of 11447 labeled traffic scenes. Our results indicate that excellent classification results can be achieved with very short image representations, and that fine-tuning on the target dataset image data is not mandatory. Image descriptors can be as short as 128 components while still offering good performance, even in presence of adverse weather or illumination conditions.

[1]  Serge J. Belongie,et al.  Residual Networks Behave Like Ensembles of Relatively Shallow Networks , 2016, NIPS.

[2]  Xuelong Li,et al.  Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval , 2017, IEEE Transactions on Image Processing.

[3]  Torsten Bertram,et al.  A combined recognition and segmentation model for urban traffic scene understanding , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Khaled F. Hussain,et al.  A Comprehensive Study of the Effect of Spatial Resolution and Color of Digital Images on Vehicle Classification , 2019, IEEE Transactions on Intelligent Transportation Systems.

[6]  Josip Krapac,et al.  Robust Traffic Scene Recognition with a Limited Descriptor Length , 2015, CVPR 2015.

[7]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[8]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[9]  Christoph Seeger,et al.  Towards Road Type Classification with Occupancy Grids , 2016 .

[10]  Frédéric Jurie,et al.  Modeling spatial layout with fisher vectors for image categorization , 2011, 2011 International Conference on Computer Vision.

[11]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[12]  Honggang Zhang,et al.  Cross-Domain Traffic Scene Understanding: A Dense Correspondence-Based Transfer Learning Approach , 2018, IEEE Transactions on Intelligent Transportation Systems.

[13]  Nicu Sebe,et al.  A Survey on Learning to Hash , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Shiguang Shan,et al.  Deep Supervised Hashing for Fast Image Retrieval , 2016, International Journal of Computer Vision.

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Antonio Torralba,et al.  Scene-Centered Description from Spatial Envelope Properties , 2002, Biologically Motivated Computer Vision.

[20]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[21]  Iasonas Kokkinos,et al.  Deep Filter Banks for Texture Recognition, Description, and Segmentation , 2015, International Journal of Computer Vision.

[22]  Sinisa Segvic,et al.  Image representations on a budget: Traffic scene classification in a restricted bandwidth scenario , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[23]  Shiguang Shan,et al.  Deep Supervised Hashing for Fast Image Retrieval , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[26]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[27]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[28]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[30]  Li-Ta Hsu,et al.  Intelligent Viaduct Recognition and Driving Altitude Determination Using GPS Data , 2017, IEEE Transactions on Intelligent Vehicles.

[31]  Christoph Stiller,et al.  The Role of Machine Vision for Intelligent Vehicles , 2016, IEEE Transactions on Intelligent Vehicles.

[32]  Honggang Zhang,et al.  A benchmark for cross-weather traffic scene understanding , 2016, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).

[33]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[34]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[35]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[36]  Ling Shao,et al.  Sequential Discrete Hashing for Scalable Cross-Modality Similarity Retrieval , 2017, IEEE Transactions on Image Processing.

[37]  Mohammed Bennamoun,et al.  Resfeats: Residual network based features for image classification , 2016, 2017 IEEE International Conference on Image Processing (ICIP).

[38]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[39]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[40]  Jian Sun,et al.  Optimized Product Quantization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Loris Nanni,et al.  Handcrafted vs. non-handcrafted features for computer vision classification , 2017, Pattern Recognit..

[42]  Eduard Ayguadé,et al.  On the Behavior of Convolutional Nets for Feature Extraction , 2017, J. Artif. Intell. Res..

[43]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.