Multi-View Vehicle Re-Identification using Temporal Attention Model and Metadata Re-ranking

Object re-identification (ReID) is an arduous task which requires matching an object across different nonoverlapping camera views. Recently, many researchers are working on person ReID by taking advantages of appearance, human pose, temporal constraints, etc. However, vehicle ReID is even more challenging because vehicles have fewer discriminant features than human due to viewpoint orientation, changes in lighting condition and inter-class similarity. In this paper, we propose a viewpoint-aware temporal attention model for vehicle ReID utilizing deep learning features extracted from consecutive frames with vehicle orientation and metadata attributes (i.e., type, brand, color) being taken into consideration. In addition, re-ranking with soft decision boundary is applied as post-processing for result refinement. The proposed method is evaluated on CVPR AI City Challenge 2019 dataset, achieving mAP of 79.17% with the second place ranking in the competition.

[1]  Konrad Schindler,et al.  Towards Scene Understanding with Detailed 3D Object Representations , 2014, International Journal of Computer Vision.

[2]  Jürgen Schmidhuber,et al.  Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[3]  Liang Zheng,et al.  Re-ranking Person Re-identification with k-Reciprocal Encoding , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Kil-Taek Lim,et al.  Vehicle Type Classification Using Bagging and Convolutional Neural Network on Multi View Surveillance Image , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5]  Farzin Aghdasi,et al.  Vehicle Re-identification: an Efficient Baseline Using Triplet Embedding , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[6]  Zheng Wang,et al.  Ranking Optimization for Person Re-identification via Similarity and Dissimilarity , 2015, ACM Multimedia.

[7]  Wei Zeng,et al.  Exploiting Multi-grain Ranking Constraints for Precisely Searching Visually-similar Vehicles , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Shihong Du,et al.  Spectral–Spatial Feature Extraction for Hyperspectral Image Classification: A Dimension Reduction and Deep Learning Approach , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[9]  Shiliang Zhang,et al.  RAM: A Region-Aware Deep Model for Vehicle Re-Identification , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[10]  Yi Yang,et al.  Person Re-identification: Past, Present and Future , 2016, ArXiv.

[11]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[12]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[14]  K. Madhava Krishna,et al.  Shape priors for real-time monocular object localization in dynamic environments , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Xiaoou Tang,et al.  A large-scale car dataset for fine-grained categorization and verification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[17]  Jitendra Malik,et al.  Learning Category-Specific Deformable 3D Models for Object Reconstruction , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Tieniu Tan,et al.  A Light CNN for Deep Face Representation With Noisy Labels , 2015, IEEE Transactions on Information Forensics and Security.

[19]  Gian Luca Foresti,et al.  Discriminant Context Information Analysis for Post-Ranking Person Re-Identification , 2017, IEEE Transactions on Image Processing.

[20]  Xiaogang Wang,et al.  DeepReID: Deep Filter Pairing Neural Network for Person Re-identification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Tiejun Huang,et al.  Deep Relative Distance Learning: Tell the Difference between Similar Vehicles , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yi Yang,et al.  Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in Vitro , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[25]  Ling-Yu Duan,et al.  Incorporating intra-class variance to fine-grained visual recognition , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[26]  Alfredo Gardel Vicente,et al.  Person Re-Identification Ranking Optimisation by Discriminant Context Information Analysis , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  K. Madhava Krishna,et al.  The Earth Ain't Flat: Monocular Reconstruction of Vehicles on Steep and Graded Roads from a Moving Camera , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[28]  Bingpeng Ma,et al.  A Spatio-Temporal Appearance Representation for Video-Based Pedestrian Re-Identification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Zheng Wang,et al.  Person Reidentification via Ranking Aggregation of Similarity Pulling and Dissimilarity Pushing , 2016, IEEE Transactions on Multimedia.

[30]  Jenq-Neng Hwang,et al.  CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Qi Tian,et al.  Scalable Person Re-identification: A Benchmark , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Qi Tian,et al.  MARS: A Video Benchmark for Large-Scale Person Re-Identification , 2016, ECCV.

[33]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[34]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[35]  Shishir K. Shah,et al.  A survey of approaches and trends in person re-identification , 2014, Image Vis. Comput..

[36]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[37]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Ramakant Nevatia,et al.  Revisiting Temporal Modeling for Video-based Person ReID , 2018, ArXiv.