AI-Oriented Large-Scale Video Management for Smart City: Technologies, Standards, and Beyond

Deep learning has achieved substantial success in intelligent video analysis. To practically facilitate deep neural network models in the large-scale video analysis, there are still unprecedented challenges. Deep feature coding, instead of video coding, provides a practical solution for handling the large-scale video surveillance data. To enable interoperability in the context of deep feature coding, standardization is urgent and important. This paper envisions the future deep feature coding standard for the AI-oriented large-scale video management and discusses existing techniques, standards, and possible solutions for these open problems.

[1]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Yang Song,et al.  Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[4]  Dong Liu,et al.  Multi-Scale Triplet CNN for Person Re-Identification , 2016, ACM Multimedia.

[5]  Shiliang Zhang,et al.  Deep Attributes Driven Multi-Camera Person Re-identification , 2016, ECCV.

[6]  Chao Zhang,et al.  Hard-Aware Deeply Cascaded Embedding , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Dongqing Zhang,et al.  Neural Aggregation Network for Video Face Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Xiaogang Wang,et al.  Learning Deep Feature Representations with Domain Guided Dropout for Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Ronan Sicre,et al.  Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[12]  Rita Cucchiara,et al.  GOLD: Gaussians of Local Descriptors for image representation , 2015, Comput. Vis. Image Underst..

[13]  Chang Huang,et al.  Targeting Ultimate Accuracy: Face Recognition via Deep Embedding , 2015, ArXiv.

[14]  Amit K. Roy-Chowdhury,et al.  Temporal Model Adaptation for Person Re-identification , 2016, ECCV.

[15]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[17]  Meng Yang,et al.  Large-Margin Softmax Loss for Convolutional Neural Networks , 2016, ICML.

[18]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Victor S. Lempitsky,et al.  Aggregating Local Deep Features for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[21]  Xiaogang Wang,et al.  Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Shaogang Gong,et al.  Learning a Discriminative Null Space for Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Xiaogang Wang,et al.  Spindle Net: Person Re-identification with Human Body Region Guided Feature Decomposition and Fusion , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Wen Gao,et al.  HNIP: Compact Deep Invariant Representations for Video Matching, Localization, and Retrieval , 2017, IEEE Transactions on Multimedia.

[25]  Nanning Zheng,et al.  Point to Set Similarity Based Deep Feature Learning for Person Re-Identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Cristian Sminchisescu,et al.  Free-Form Region Description with Second-Order Pooling , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Nanning Zheng,et al.  Similarity Learning with Spatial Constraints for Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[29]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Ling-Yu Duan,et al.  Incorporating intra-class variance to fine-grained visual recognition , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[31]  Tiejun Huang,et al.  Deep Relative Distance Learning: Tell the Difference between Similar Vehicles , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  Dacheng Tao,et al.  Robust Face Recognition via Multimodal Deep Face Representation , 2015, IEEE Transactions on Multimedia.

[34]  Shengcai Liao,et al.  Person re-identification by Local Maximal Occurrence representation and metric learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Albert Gordo,et al.  Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.

[36]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Rita Cucchiara,et al.  Covariance of Covariance Features for Image Classification , 2014, ICMR.

[38]  Takahiro Okabe,et al.  Supplementary Material for Hierarchical Gaussian Descriptor for Person Re-Identification , 2016 .