Enhance Visual Recognition Under Adverse Conditions via Deep Networks

Visual recognition under adverse conditions is a very important and challenging problem of high practical value, due to the ubiquitous existence of quality distortions during image acquisition, transmission, or storage. While deep neural networks have been extensively exploited in the techniques of low-quality image restoration and high-quality image recognition tasks, respectively, few studies have been done on the important problem of recognition from very low-quality images. This paper proposes a deep learning-based framework for improving the performance of image and video recognition models under adverse conditions, using robust adverse pre-training or its aggressive variant. The robust adverse pre-training algorithms leverage the power of pre-training and generalize the conventional unsupervised pre-training and data augmentation methods. We further develop a transfer learning approach to cope with real-world datasets of unknown adverse conditions. The proposed framework is comprehensively evaluated on a number of image and video recognition benchmarks, and obtains significant performance improvements under various single or mixed adverse conditions. Our visualization and analysis further add to the explainability of the results.

[1]  Ying-li Tian,et al.  Evaluation of Face Resolution for Expression Analysis , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[2]  Shaogang Gong,et al.  Recognizing facial expressions at low resolution , 2005, IEEE Conference on Advanced Video and Signal Based Surveillance, 2005..

[3]  William T. Freeman,et al.  Removing camera shake from a single photograph , 2006, SIGGRAPH 2006.

[4]  License Plate Recognition from Low-Quality Videos , 2007, MVA.

[5]  Roberto Cipolla,et al.  A manifold approach to face recognition from low quality video across illumination and pose using implicit super-resolution , 2007, ICCV 2007.

[6]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[7]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[8]  Pablo H. Hennings-Yeomans,et al.  Simultaneous super-resolution and feature extraction for recognition of low-resolution faces , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[10]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[11]  Andrzej Pacut,et al.  Face tracking and recognition in low quality video sequences with the use of particle filtering , 2009, 43rd Annual 2009 International Carnahan Conference on Security Technology.

[12]  Pascal Vincent,et al.  The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training , 2009, AISTATS.

[13]  VincentPascal,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010 .

[14]  Pong C. Yuen,et al.  Very low resolution face recognition problem , 2010, 2010 Fourth IEEE International Conference on Biometrics: Theory, Applications and Systems (BTAS).

[15]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[16]  Erik Learned-Miller,et al.  FDDB: A benchmark for face detection in unconstrained settings , 2010 .

[17]  Thomas S. Huang,et al.  Image Super-Resolution Via Sparse Representation , 2010, IEEE Transactions on Image Processing.

[18]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[19]  Jürgen Schmidhuber,et al.  Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[20]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[21]  Thomas S. Huang,et al.  Close the loop: Joint blind image restoration and recognition with sparse representation prior , 2011, 2011 International Conference on Computer Vision.

[22]  Tal Hassner,et al.  Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.

[23]  Xiao Zhang,et al.  Finding Celebrities in Billions of Web Images , 2012, IEEE Transactions on Multimedia.

[24]  Enhong Chen,et al.  Image Denoising and Inpainting with Deep Neural Networks , 2012, NIPS.

[25]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[26]  L. Spreeuwers,et al.  The impact of image quality on the performance of face recognition , 2012 .

[27]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[28]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Michele Nappi,et al.  Face Recognition in Adverse Conditions , 2014 .

[30]  Arun Ross,et al.  Design and evaluation of photometric image quality measures for effective face recognition , 2014, IET Biom..

[31]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[32]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Thomas S. Huang,et al.  DeepFont: Identify Your Font from An Image , 2015, ACM Multimedia.

[34]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[35]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[37]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[38]  Robust Single Image Super-Resolution via Deep Networks With Sparse Prior. , 2016, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[39]  Thomas S. Huang,et al.  Studying Very Low Resolution Recognition Using Deep Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Lina J. Karam,et al.  Understanding how image quality affects deep neural networks , 2016, 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX).

[41]  Shuo Yang,et al.  WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Thomas S. Huang,et al.  Robust Single Image Super-Resolution via Deep Networks With Sparse Prior , 2016, IEEE Transactions on Image Processing.

[43]  Hazim Kemal Ekenel,et al.  How Image Degradations Affect Deep CNN-Based Face Recognition? , 2016, 2016 International Conference of the Biometrics Special Interest Group (BIOSIG).

[44]  Jürgen Beyerer,et al.  Low-Quality Video Face Recognition with Deep Networks and Polygonal Chain Distance , 2016, 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[45]  Aggelos K. Katsaggelos,et al.  Video Super-Resolution With Convolutional Neural Networks , 2016, IEEE Transactions on Computational Imaging.

[46]  Sangram Ganguly,et al.  Learning Sparse Feature Representations Using Probabilistic Quadtrees and Deep Belief Nets , 2015, Neural Processing Letters.

[47]  Jizheng Xu,et al.  AOD-Net: All-in-One Dehazing Network , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[48]  Xianming Liu,et al.  Robust Video Super-Resolution with Learned Temporal Dynamics , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[49]  Xianming Liu,et al.  When Image Denoising Meets High-Level Vision Tasks: A Deep Learning Approach , 2017, IJCAI.

[50]  Xianming Liu,et al.  Learning Temporal Dynamics for Video Super-Resolution: A Deep Learning Approach , 2018, IEEE Transactions on Image Processing.