Fuzzy Analysis and Deep Convolution Neural Networks in Still-to-video Recognition

We discuss the video classification problem with the matching of feature vectors extracted using deep convolutional neural networks from each frame. We propose the novel recognition method based on representation of each frame as a sequence of fuzzy sets of reference classes whose degrees of membership are defined based on asymptotic distribution of the Kullback–Leibler information divergence and its relation with the maximum likelihood method. In order to increase the classification accuracy, we perform the fuzzy intersection (product triangular norms) of these sets. Experimental study with YTF (YouTube Faces) and IJB-A (IARPA Janus Benchmark A) video datasets and VGGFace, ResFace and LightCNN descriptors shows that the proposed approach allows us to increase the accuracy of recognition by 2–6% comparing with the known classification methods.

[1]  Andrey V. Savchenko,et al.  Search Techniques in Intelligent Classification Systems , 2016 .

[2]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[3]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  A. V. Savchenko Adaptive video image recognition system using a committee machine , 2012, Optical Memory and Neural Networks.

[5]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[6]  Lei Zhang,et al.  One-shot Face Recognition by Promoting Underrepresented Classes , 2017, ArXiv.

[7]  Andrey V. Savchenko,et al.  Organizing Multimedia Data in Video Surveillance Systems Based on Face Verification with Convolutional Neural Networks , 2017, AIST.

[8]  Robert P. W. Duin,et al.  Classifiers for dissimilarity-based pattern recognition , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[9]  Dongqing Zhang,et al.  Neural Aggregation Network for Video Face Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Zhenan Sun,et al.  A Lightened CNN for Deep Face Representation , 2015, ArXiv.

[11]  Andrey V. Savchenko,et al.  Statistical Testing of Segment Homogeneity in Classification of Piecewise–Regular Objects , 2015, Int. J. Appl. Math. Comput. Sci..

[12]  Arnold W. M. Smeulders,et al.  The Distribution Family of Similarity Distances , 2007, NIPS.

[13]  Andrey V. Savchenko,et al.  Maximum-likelihood dissimilarities in image recognition with deep neural networks , 2017 .

[14]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[15]  A. V. Savchenko Deep neural networks and maximum likelihood search for approximate nearest neighbor in video-based image recognition , 2017, Optical Memory and Neural Networks.

[16]  Trevor Darrell,et al.  Face Recognition from Long-Term Observations , 2002, ECCV.

[17]  Shiguang Shan,et al.  Coupling Alignments with Recognition for Still-to-Video Face Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[18]  A. V. Savchenko,et al.  About neural-network algorithms application in viseme classification problem with face video in audiovisual speech recognition systems , 2014, Optical Memory and Neural Networks.

[19]  Thomas G. Dietterich Adaptive computation and machine learning , 1998 .

[20]  Andrey V. Savchenko,et al.  Towards the creation of reliable voice control system based on a fuzzy approach , 2015, Pattern Recognit. Lett..

[21]  Chang Huang,et al.  Targeting Ultimate Accuracy: Face Recognition via Deep Embedding , 2015, ArXiv.

[22]  Simon J. D. Prince,et al.  Computer Vision: Index , 2012 .

[23]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[24]  Lotfi A. Zadeh,et al.  MAKING COMPUTERS THINK LIKE PEOPLE , 1984 .

[25]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[26]  Zhi-Hua Zhou,et al.  Face recognition from a single image per person: A survey , 2006, Pattern Recognit..

[27]  Anil K. Jain,et al.  Unconstrained Face Recognition: Identifying a Person of Interest From a Media Collection , 2014, IEEE Transactions on Information Forensics and Security.

[28]  Loris Nanni,et al.  Ensemble of texture descriptors and classifiers for face recognition , 2017 .

[29]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[30]  Shuicheng Yan,et al.  Toward Large-Population Face Identification in Unconstrained Videos , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[31]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[32]  Xilin Chen,et al.  Projection Metric Learning on Grassmann Manifold with Application to Video based Face Recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Andrey V. Savchenko Phonetic encoding method in the isolated words recognition problem , 2014 .

[35]  Anil K. Jain,et al.  Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark A , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Tal Hassner,et al.  Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.

[37]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Andrey V. Savchenko Face Recognition in Real-Time Applications: A Comparison of Directed Enumeration Method and K-d Trees , 2012, BIR.

[39]  L. A. Zadeh,et al.  Making computers think like people [fuzzy set theory] , 1984, IEEE Spectrum.

[40]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[41]  S. Kullback,et al.  Information Theory and Statistics , 1959 .

[42]  Andrey V. Savchenko,et al.  Fuzzy Phonetic Decoding Method in a Phoneme Recognition Problem , 2013, NOLISP.

[43]  Simon J. D. Prince,et al.  Computer Vision: Models, Learning, and Inference , 2012 .

[44]  Andrey V. Savchenko,et al.  Information-theoretic analysis of efficiency of the phonetic encoding–decoding method in automatic speech recognition , 2016 .

[45]  Andrey V. Savchenko,et al.  Maximum Likelihood Directed Enumeration Method in Piecewise-Regular Object Recognition , 2014, Pattern Recognit..

[46]  Gang Hua,et al.  Labeled Faces in the Wild: A Survey , 2016 .

[47]  Ivan Laptev,et al.  Learnable pooling with Context Gating for video classification , 2017, ArXiv.

[48]  The Principle of the Information-Divergence Minimum in the Problem of Spectral Analysis of the Random Time Series Under the Condition of Small Observation Samples , 2015 .

[49]  Tal Hassner,et al.  Do We Really Need to Collect Millions of Faces for Effective Face Recognition? , 2016, ECCV.