Organizing Multimedia Data in Video Surveillance Systems Based on Face Verification with Convolutional Neural Networks

In this paper we propose the two-stage approach of organizing information in video surveillance systems. At first, the faces are detected in each frame and a video stream is split into sequences of frames with face region of one person. Secondly, these sequences (tracks) that contain identical faces are grouped using face verification algorithms and hierarchical agglomerative clustering. Gender and age are estimated for each cluster (person) in order to facilitate the usage of the organized video collection. The particular attention is focused on the aggregation of features extracted from each frame with the deep convolutional neural networks. The experimental results of the proposed approach using YTF and IJB-A datasets demonstrated that the most accurate and fast solution is achieved for matching of normalized average of feature vectors of all frames in a track.

[1]  Brian Kulis,et al.  Metric Learning: A Survey , 2013, Found. Trends Mach. Learn..

[2]  A. V. Savchenko Deep neural networks and maximum likelihood search for approximate nearest neighbor in video-based image recognition , 2017, Optical Memory and Neural Networks.

[3]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[4]  Andrey V. Savchenko Deep Convolutional Neural Networks and Maximum-Likelihood Principle in Approximate Nearest Neighbor Search , 2017, IbPRIA.

[5]  Dongqing Zhang,et al.  Neural Aggregation Network for Video Face Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Gang Hua,et al.  Eigen-PEP for Video Face Recognition , 2014, ACCV.

[7]  Zhenan Sun,et al.  A Lightened CNN for Deep Face Representation , 2015, ArXiv.

[8]  Rui Caseiro,et al.  Exploiting the Circulant Structure of Tracking-by-Detection with Kernels , 2012, ECCV.

[9]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[10]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[11]  Haibin Lu,et al.  A hierarchical organization scheme for video data , 2002, Pattern Recognit..

[12]  Tal Hassner,et al.  Age and gender classification using convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13]  Tieniu Tan,et al.  A Light CNN for Deep Face Representation With Noisy Labels , 2015, IEEE Transactions on Information Forensics and Security.

[14]  Victor S. Lempitsky,et al.  The Inverted Multi-Index , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Jun-Cheng Chen,et al.  An End-to-End System for Unconstrained Face Verification with Deep Convolutional Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[16]  Andrey V. Savchenko Clustering and maximum likelihood search for efficient statistical classification with medium-sized databases , 2017, Optim. Lett..

[17]  Anil K. Jain,et al.  Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark A , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Tal Hassner,et al.  Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.

[19]  P. Valarmathie,et al.  Organizing multimedia big data using semantic based video content extraction technique , 2015, 2015 International Conference on Soft-Computing and Networks Security (ICSNS).