Front-End Smart Visual Sensing and Back-End Intelligent Analysis: A Unified Infrastructure for Economizing the Visual System of City Brain

The visual data, which are acquired from the ubiquitous visual sensors deployed in metropolitans, are of great value and paramount significance to enhance the effectiveness and pursue the future development of smart cities. In this paper, the essential building blocks of the unified visual data management and analysis infrastructure that serve as the foundation for the economical visual system in the city brain, are introduced to facilitate the utilization of the visual signal in the artificial intelligence era. In particular, we start by the discussion of the front-end smart visual sensing in the context of economical communication and service with the heterogeneous network, and the functionalities and necessities of compact visual feature and deep learning model representations are detailed. Subsequently, the utilities of the infrastructure are demonstrated through two intelligent applications at the back-end, including vehicle re-identification and person re-identification. The standardizations regarding compact feature and deep neural network representations, which are regarded as the key ingredients in this infrastructure and greatly facilitate the construction of the visual system in the city brain, are also discussed. Finally, we envision how the potential issues regarding the economical visual communications for future smart cities might be pragmatically approached within this unified infrastructure.

[1]  Ivan V. Bajic,et al.  Deep Feature Compression for Collaborative Object Detection , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[2]  Michael Jones,et al.  An improved deep learning architecture for person re-identification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Sharath Pankanti,et al.  Large-Scale Vehicle Detection, Indexing, and Search in Urban Surveillance Videos , 2012, IEEE Transactions on Multimedia.

[4]  Feng Zhou,et al.  Embedding Label Structures for Fine-Grained Feature Representation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Shiliang Zhang,et al.  Pose-Driven Deep Convolutional Model for Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Ling-Yu Duan,et al.  VERI-Wild: A Large Dataset and a New Method for Vehicle Re-Identification in the Wild , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Nanning Zheng,et al.  Person Re-identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss Function , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Joost van de Weijer,et al.  Domain-Adaptive Deep Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Jiwen Lu,et al.  Learning Deep Binary Descriptor with Multi-Quantization , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Xiaogang Wang,et al.  Joint Detection and Identification Feature Learning for Person Search , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Chao Zhang,et al.  Hard-Aware Deeply Cascaded Embedding , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Ling-Yu Duan,et al.  Group-Sensitive Triplet Embedding for Vehicle Reidentification , 2018, IEEE Transactions on Multimedia.

[13]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[14]  Kaiqi Huang,et al.  Learning Deep Context-Aware Features over Body and Latent Parts for Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Ling-Yu Duan,et al.  Intermediate Deep Feature Compression: the Next Battlefield of Intelligent Sensing , 2018, ArXiv.

[16]  Antonio Iera,et al.  The Internet of Things: A survey , 2010, Comput. Networks.

[17]  Kaiqi Huang,et al.  Beyond Triplet Loss: A Deep Quadruplet Network for Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[19]  Yi Yang,et al.  Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in Vitro , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Hanqing Lu,et al.  Learning Coarse-to-Fine Structured Feature Embedding for Vehicle Re-Identification , 2018, AAAI.

[21]  Eunhyeok Park,et al.  Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications , 2015, ICLR.

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Xiaogang Wang,et al.  Learning Deep Neural Networks for Vehicle Re-ID with Visual-spatio-Temporal Path Proposals , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[26]  Ondrej Chum,et al.  CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.

[27]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Ling Shao,et al.  Viewpoint-Aware Attentive Multi-view Inference for Vehicle Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[30]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[31]  Xuelong Li,et al.  Towards Convolutional Neural Networks Compression via Global Error Reconstruction , 2016, IJCAI.

[32]  Tao Mei,et al.  A Deep Learning-Based Approach to Progressive Vehicle Re-identification for Urban Surveillance , 2016, ECCV.

[33]  Ling-Yu Duan,et al.  Compact Descriptors for Video Analysis: The Emerging MPEG Standard , 2017, IEEE MultiMedia.

[34]  Yi Yang,et al.  Random Erasing Data Augmentation , 2017, AAAI.

[35]  Wen Gao,et al.  Digital retina: revolutionizing camera systems for the smart city , 2018 .

[36]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Ling-Yu Duan,et al.  From Data to Knowledge: Deep Learning Model Compression, Transmission and Communication , 2018, ACM Multimedia.

[38]  Jie Lin,et al.  A practical guide to CNNs and Fisher Vectors for image instance retrieval , 2015, Signal Process..

[39]  Wen Gao,et al.  HNIP: Compact Deep Invariant Representations for Video Matching, Localization, and Retrieval , 2017, IEEE Transactions on Multimedia.

[40]  Nanning Zheng,et al.  Point to Set Similarity Based Deep Feature Learning for Person Re-Identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[42]  Bolei Zhou,et al.  Temporal Relational Reasoning in Videos , 2017, ECCV.

[43]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[44]  Wen Gao,et al.  To Project More or to Quantize More: Minimize Reconstruction Bias for Learning Compact Binary Codes , 2016, IJCAI.

[45]  Ling Shao,et al.  Vehicle Re-Identification by Deep Hidden Multi-View Inference , 2018, IEEE Transactions on Image Processing.

[46]  Jingdong Wang,et al.  Deeply-Learned Part-Aligned Representations for Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[47]  Limin Wang,et al.  Temporal Action Detection with Structured Segment Networks , 2017, International Journal of Computer Vision.

[48]  Jiwen Lu,et al.  Consistent-Aware Deep Learning for Person Re-identification in a Camera Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Ming Yang,et al.  Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[50]  Yurong Chen,et al.  Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[51]  Yifan Sun,et al.  SVDNet for Pedestrian Retrieval , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[52]  Yi Yang,et al.  Image-Image Domain Adaptation with Preserved Self-Similarity and Domain-Dissimilarity for Person Re-identification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Qi Tian,et al.  Scalable Person Re-identification: A Benchmark , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[54]  Lin Xu,et al.  Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights , 2017, ICLR.

[55]  Longhui Wei,et al.  Person Transfer GAN to Bridge Domain Gap for Person Re-identification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56]  Xin Dong,et al.  Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon , 2017, NIPS.

[57]  Tiejun Huang,et al.  Deep Relative Distance Learning: Tell the Difference between Similar Vehicles , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Francesco Solera,et al.  Performance Measures and a Data Set for Multi-target, Multi-camera Tracking , 2016, ECCV Workshops.

[60]  Yi Yang,et al.  A Discriminatively Learned CNN Embedding for Person Reidentification , 2016, ACM Trans. Multim. Comput. Commun. Appl..

[61]  Shengcai Liao,et al.  Deep Metric Learning for Person Re-identification , 2014, 2014 22nd International Conference on Pattern Recognition.

[62]  Wen Gao,et al.  AI-Oriented Large-Scale Video Management for Smart City: Technologies, Standards, and Beyond , 2017, IEEE MultiMedia.

[63]  Jian Sun,et al.  AlignedReID: Surpassing Human-Level Performance in Person Re-Identification , 2017, ArXiv.

[64]  Shuicheng Yan,et al.  End-to-End Comparative Attention Networks for Person Re-Identification , 2016, IEEE Transactions on Image Processing.

[65]  Jiwen Lu,et al.  Simultaneous Local Binary Feature Learning and Encoding for Homogeneous and Heterogeneous Face Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  Afshin Abdi,et al.  Net-Trim: Convex Pruning of Deep Neural Networks with Performance Guarantee , 2016, NIPS.

[67]  Wen Gao,et al.  Affinity preserving quantization for hashing: a vector quantization approach to learning compact binary codes , 2016, AAAI 2016.

[68]  Shaogang Gong,et al.  Person Re-identification by Deep Learning Multi-scale Representations , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[69]  Atsuto Maki,et al.  From generic to specific deep representations for visual recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[70]  Jing Xu,et al.  Attention-Aware Compositional Network for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[71]  Hao Zhou,et al.  Less Is More: Towards Compact CNNs , 2016, ECCV.

[72]  Ivan V. Bajic,et al.  Near-Lossless Deep Feature Compression for Collaborative Intelligence , 2018, 2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP).

[73]  Xiaogang Wang,et al.  Orientation Invariant Feature Embedding and Spatial Temporal Regularization for Vehicle Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[74]  Hongfei Fan,et al.  Rate-Performance-Loss Optimization for Inter-Frame Deep Feature Coding From Videos , 2017, IEEE Transactions on Image Processing.

[75]  Gang Wang,et al.  Dual Attention Matching Network for Context-Aware Feature Sequence Based Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[76]  Dacheng Tao,et al.  Beyond Filters: Compact Feature Map for Portable Deep Model , 2017, ICML.

[77]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[78]  Wen Gao,et al.  Compact Deep Invariant Descriptors for Video Retrieval , 2017, 2017 Data Compression Conference (DCC).

[79]  Ronan Sicre,et al.  Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[80]  Xiaogang Wang,et al.  Face Model Compression by Distilling Knowledge from Neurons , 2016, AAAI.

[81]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[82]  James Zijun Wang,et al.  Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers , 2018, ICLR.

[83]  Jiwen Lu,et al.  Runtime Neural Pruning , 2017, NIPS.

[84]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[85]  Yang Song,et al.  Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[86]  Yong Luo,et al.  Towards Digital Retina in Smart Cities: A Model Generation, Utilization and Communication Paradigm , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[87]  Shiliang Zhang,et al.  Deep Attributes Driven Multi-Camera Person Re-identification , 2016, ECCV.

[88]  Wen Gao,et al.  Fast MPEG-CDVS Encoder With GPU-CPU Hybrid Computing , 2017, IEEE Transactions on Image Processing.

[89]  Shenghuo Zhu,et al.  Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM , 2017, AAAI.

[90]  Mo Li,et al.  Vision and Challenges for Knowledge Centric Networking , 2019, IEEE Wireless Communications.

[91]  Victor S. Lempitsky,et al.  Neural Codes for Image Retrieval , 2014, ECCV.

[92]  Max Welling,et al.  Bayesian Compression for Deep Learning , 2017, NIPS.

[93]  Shengcai Liao,et al.  Person re-identification by Local Maximal Occurrence representation and metric learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[94]  Dacheng Tao,et al.  On Compressing Deep Models by Low Rank and Sparse Decomposition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).