Guiding Intelligent Surveillance System by learning-by-synthesis gaze estimation

Abstract We describe a novel learning-by-synthesis method for estimating the gaze direction of an automated intelligent surveillance system. Recently, the progress of integrated learning has proposed a training model for synthetic images, which can effectively reduce the cost of human and material resources. However, due to the different distributions between the real and synthetic images, the desired performance cannot be achieved from the synthetic image learning as compared to the real images. In order to solve this problem, the previous method was to improve the authenticity of the composite image by learning the model. However, this kind of method had the disadvantage that the distortion was not improved and the level of authenticity was unstable. In order to solve this problem, we propose a new structure to improve the composite image. By referring to the idea of style transformation, we can effectively reduce the distortion of the image and minimize the need for actual data annotation. We estimate that this can produce highly realistic images, which we have demonstrated through qualitative and user research. We quantitatively evaluate the generated images by training the gaze estimation model. We use the refined synthetic dataset to show significant improvements compared with using the raw synthetic dataset.

[1]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[2]  Jing Zhang,et al.  Multi-view Sparsity Preserving Projection for dimension reduction , 2016, Neurocomputing.

[3]  Sylvain Paris,et al.  Deep Photo Style Transfer , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Lin Wu,et al.  Where-and-When to Look: Deep Siamese Attention Networks for Video-Based Person Re-Identification , 2018, IEEE Transactions on Multimedia.

[5]  Timo Schneider,et al.  Manifold Alignment for Person Independent Appearance-Based Gaze Estimation , 2014, 2014 22nd International Conference on Pattern Recognition.

[6]  Lin Wu,et al.  Beyond Low-Rank Representations: Orthogonal Clustering Basis Reconstruction with Optimized Graph Structure for Multi-view Spectral Clustering , 2017, Neural Networks.

[7]  Xue Li,et al.  Deep Attention-Based Spatially Recursive Networks for Fine-Grained Visual Recognition , 2019, IEEE Transactions on Cybernetics.

[8]  Lin Wu,et al.  Robust Subspace Clustering for Multi-View Data by Exploiting Correlation Consensus , 2015, IEEE Transactions on Image Processing.

[9]  Mario Fritz,et al.  Appearance-based gaze estimation in the wild , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Xianping Fu,et al.  Refining Synthetic Images with Semantic Layouts by Adversarial Training , 2018, ACML.

[11]  Aykut Erdem,et al.  Learning to Generate Images of Outdoor Scenes from Attributes and Semantic Layouts , 2016, ArXiv.

[12]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Thomas Brox,et al.  Learning to Generate Chairs, Tables and Cars with Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Xianping Fu,et al.  Image Purification Networks: Real-time Style Transfer with Semantics through Feed-forward Synthesis , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[15]  Huibing Wang,et al.  Deep CNNs With Spatially Weighted Pooling for Fine-Grained Car Recognition , 2017, IEEE Transactions on Intelligent Transportation Systems.

[16]  Yoichi Sato,et al.  Learning-by-Synthesis for Appearance-Based 3D Gaze Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Takahiro Okabe,et al.  Adaptive Linear Regression for Appearance-Based Gaze Estimation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19]  Lin Wu,et al.  Multiview Spectral Clustering via Structured Low-Rank Matrix Factorization , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Mario Fritz,et al.  MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Thomas Brox,et al.  Generating Images with Perceptual Similarity Metrics based on Deep Networks , 2016, NIPS.

[22]  Lin Wu,et al.  Effective Multi-Query Expansions: Robust Landmark Retrieval , 2015, ACM Multimedia.

[23]  Lin Wu,et al.  What-and-Where to Match: Deep Spatially Multiplicative Integration Networks for Person Re-identification , 2017, Pattern Recognit..

[24]  Lin Feng,et al.  Learning a Distance Metric by Balancing KL-Divergence for Imbalanced Datasets , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[25]  Yusuke Sugano,et al.  Labelled pupils in the wild: a dataset for studying pupil detection in unconstrained environments , 2015, ETRA.

[26]  Lin Wu,et al.  Deep adaptive feature embedding with local sample distributions for person re-identification , 2017, Pattern Recognit..

[27]  Thomas S. Huang,et al.  DeepFont: Identify Your Font from An Image , 2015, ACM Multimedia.

[28]  Peter Robinson,et al.  Rendering of Eyes for Eye-Shape Registration and Gaze Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Abhinav Gupta,et al.  Generative Image Modeling Using Style and Structure Adversarial Networks , 2016, ECCV.

[30]  Leon A. Gatys,et al.  A Neural Algorithm of Artistic Style , 2015, ArXiv.

[31]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Yoshua Bengio,et al.  Generative Adversarial Networks , 2014, ArXiv.

[33]  Xianping Fu,et al.  Learning a gaze estimator with neighbor selection from large-scale synthetic eye images , 2018, Knowl. Based Syst..

[34]  Ian Goodfellow,et al.  Generative adversarial networks , 2020, Commun. ACM.

[35]  Lin Wu,et al.  Effective Multi-Query Expansions: Collaborative Deep Networks for Robust Landmark Retrieval , 2017, IEEE Transactions on Image Processing.

[36]  Shengjun Liu,et al.  Learning to predict crisp boundaries , 2018, ECCV.

[37]  Lin Wu,et al.  Iterative Views Agreement: An Iterative Low-Rank Based Structured Optimization Method to Multi-View Spectral Clustering , 2016, IJCAI.

[38]  Peter Robinson,et al.  Learning an appearance-based gaze estimator from one million synthesised images , 2016, ETRA.

[39]  Jing Zhang,et al.  Semantic Discriminative Metric Learning for Image Similarity Measurement , 2016, IEEE Transactions on Multimedia.