BoxCars: Improving Fine-Grained Recognition of Vehicles Using 3-D Bounding Boxes in Traffic Surveillance

In this paper, we focus on fine-grained recognition of vehicles mainly in traffic surveillance applications. We propose an approach that is orthogonal to recent advancements in fine-grained recognition (automatic part discovery and bilinear pooling). In addition, in contrast to other methods focused on fine-grained recognition of vehicles, we do not limit ourselves to a frontal/rear viewpoint, but allow the vehicles to be seen from any viewpoint. Our approach is based on 3-D bounding boxes built around the vehicles. The bounding box can be automatically constructed from traffic surveillance data. For scenarios where it is not possible to use precise construction, we propose a method for an estimation of the 3-D bounding box. The 3-D bounding box is used to normalize the image viewpoint by “unpacking” the image into a plane. We also propose to randomly alter the color of the image and add a rectangle with random noise to a random position in the image during the training of convolutional neural networks (CNNs). We have collected a large fine-grained vehicle data set BoxCars116k, with 116k images of vehicles from various viewpoints taken by numerous surveillance cameras. We performed a number of experiments, which show that our proposed method significantly improves CNN classification accuracy (the accuracy is increased by up to 12% points and the error is reduced by up to 50% compared with CNNs without the proposed modifications). We also show that our method outperforms the state-of-the-art methods for fine-grained recognition.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Simant Prakoonwit,et al.  Car make and model recognition under limited lighting conditions at night , 2017, Pattern Analysis and Applications.

[4]  Jonathan Krause,et al.  Fine-grained recognition without part annotations , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Eleftherios Kayafas,et al.  Vehicle model recognition from frontal view image measurements , 2011, Comput. Stand. Interfaces.

[6]  Maurice Milgram,et al.  Multi-class Vehicle Type Recognition System , 2008, ANNPR.

[7]  Cewu Lu,et al.  Deep LAC: Deep localization, alignment and classification for fine-grained recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Luc Van Gool,et al.  Hough Forests for Object Detection, Tracking, and Action Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Qi Tian,et al.  Fine-grained visual categorization with fine-tuned segmentation , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[11]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[12]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[13]  Gérard G. Medioni,et al.  3-D model based vehicle recognition , 2009, 2009 Workshop on Applications of Computer Vision (WACV).

[14]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Andrzej Glowacz,et al.  The efficient real- and non-real-time make and model recognition of cars , 2013, Multimedia Tools and Applications.

[17]  Linda G. Shapiro,et al.  Unsupervised Template Learning for Fine-Grained Object Recognition , 2012, NIPS.

[18]  Wu Liu,et al.  Large-scale vehicle re-identification in urban surveillance videos , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[19]  Arnold W. M. Smeulders,et al.  Local Alignments for Fine-Grained Categorization , 2014, International Journal of Computer Vision.

[20]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Adam Herout,et al.  BoxCars: 3D Boxes as CNN Input for Improved Fine-Grained Vehicle Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jonathan Krause,et al.  Learning Features and Parts for Fine-Grained Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[23]  Ignacio Parra,et al.  Vehicle model recognition using geometry and appearance of car emblems from rear view images , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[24]  C. V. Jawahar,et al.  Cats and dogs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Honglak Lee,et al.  Object Contour Detection with a Fully Convolutional Encoder-Decoder Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Xiang Bai,et al.  Learning Discriminative Pattern for Real-Time Car Brand Recognition , 2015, IEEE Transactions on Intelligent Transportation Systems.

[27]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[28]  Luc Van Gool,et al.  Deep Expectation of Real and Apparent Age from a Single Image Without Facial Landmarks , 2016, International Journal of Computer Vision.

[29]  Yang Gao,et al.  Compact Bilinear Pooling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Richard Szeliski,et al.  Car make and model recognition using 3D curve alignment , 2014, IEEE Winter Conference on Applications of Computer Vision.

[31]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Feng Zhou,et al.  Fine-Grained Image Classification by Exploring Bipartite-Graph Labels , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Charless C. Fowlkes,et al.  Bilinear classifiers for visual recognition , 2009, NIPS.

[34]  Z. Zivkovic Improved adaptive Gaussian mixture model for background subtraction , 2004, ICPR 2004.

[35]  Larry S. Davis,et al.  Jointly Optimizing 3D Model Fitting and Fine-Grained Classification , 2014, ECCV.

[36]  Nick Pears,et al.  Automatic make and model recognition from frontal images of cars , 2011, 2011 8th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[37]  Xiaoou Tang,et al.  A large-scale car dataset for fine-grained categorization and verification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Tianbao Yang,et al.  Hyper-class augmented and regularized deep learning for fine-grained image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Bailing Zhang,et al.  Reliable Classification of Vehicle Types Based on Cascade Classifier Ensembles , 2013, IEEE Transactions on Intelligent Transportation Systems.

[40]  Adam Herout,et al.  Automatic Camera Calibration for Traffic Understanding , 2014, BMVC.

[41]  Luc Van Gool,et al.  TriCoS: A Tri-level Class-Discriminative Co-segmentation Method for Image Classification , 2012, ECCV.

[42]  Hong Wang,et al.  Evolving boxes for fast vehicle detection , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[43]  Gary R. Bradski,et al.  A codebook-free and annotation-free approach for fine-grained image categorization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Jonathan Krause,et al.  Using deep learning and Google Street View to estimate the demographic makeup of neighborhoods across the United States , 2017, Proceedings of the National Academy of Sciences.

[45]  Adam Herout,et al.  Fully Automatic Roadside Camera Calibration for Traffic Surveillance , 2015, IEEE Transactions on Intelligent Transportation Systems.

[46]  Jun-Wei Hsieh,et al.  Symmetrical SURF and Its Applications to Vehicle Detection and Vehicle Make and Model Recognition , 2014, IEEE Transactions on Intelligent Transportation Systems.

[47]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[48]  Jeonghwan Gwak,et al.  Vehicle Model Recognition in Video , 2013 .

[49]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[50]  Zoran Zivkovic,et al.  Improved adaptive Gaussian mixture model for background subtraction , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[51]  Qi Tian,et al.  Picking Deep Filter Responses for Fine-Grained Image Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  James J. Little,et al.  Fine-Grained Categorization for 3D Scene Understanding , 2012, BMVC.

[53]  James H. Elder,et al.  Slot Cars: 3D Modelling for Improved Visual Traffic Analytics , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[54]  Kun Duan,et al.  Discovering localized attributes for fine-grained recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Bailing Zhang Classification and identification of vehicle type and make by cortex-like image descriptor HMAX , 2014, Int. J. Comput. Vis. Robotics.

[56]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[57]  Marcel Simon,et al.  Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[58]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[59]  Xuelong Li,et al.  Detecting Densely Distributed Graph Patterns for Fine-Grained Image Categorization , 2016, IEEE Transactions on Image Processing.

[60]  Tiejun Huang,et al.  Deep Relative Distance Learning: Tell the Difference between Similar Vehicles , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Serge J. Belongie,et al.  Recognizing Cars , 2005 .

[62]  Noah Snavely,et al.  NYC3DCars: A Dataset of 3D Vehicles in Geographic Context , 2013, 2013 IEEE International Conference on Computer Vision.

[63]  Suh-Yin Lee,et al.  Car model recognition by utilizing symmetric property to overcome severe pose variation , 2012, Machine Vision and Applications.

[64]  Pavel Zemcík,et al.  Real-Time Pose Estimation Piggybacked on Object Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[65]  Trevor Darrell,et al.  Pose pooling kernels for sub-category recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[66]  Huibing Wang,et al.  Deep CNNs With Spatially Weighted Pooling for Fine-Grained Car Recognition , 2017, IEEE Transactions on Intelligent Transportation Systems.

[67]  Pietro Perona,et al.  Fast Feature Pyramids for Object Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[69]  Ya Zhang,et al.  Part-Stacked CNN for Fine-Grained Visual Categorization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Timothy F. Cootes,et al.  Analysis of Features for Rigid Structure Vehicle Type Recognition , 2004, BMVC.

[71]  G. Salvi,et al.  An Automated Nighttime Vehicle Counting and Detection System for Traffic Surveillance , 2014, 2014 International Conference on Computational Science and Computational Intelligence.

[72]  Jindong Tan,et al.  Recognition of Car Makes and Models From a Single Traffic-Camera Image , 2015, IEEE Transactions on Intelligent Transportation Systems.

[73]  Qi Wang,et al.  Exploiting effects of parts in fine-grained categorization of vehicles , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[74]  David A. McAllester,et al.  Cascade object detection with deformable part models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[75]  Yu Zhou,et al.  Fine-Grained Vehicle Model Recognition Using A Coarse-to-Fine Convolutional Neural Network Architecture , 2017, IEEE Transactions on Intelligent Transportation Systems.