A trainable monogenic ConvNet layer robust in front of large contrast changes in image classification

At present, Convolutional Neural Networks (ConvNets) achieve remarkable performance in image classification tasks. However, current ConvNets cannot guarantee the capabilities of mammalian visual systems such as invariance to contrast and illumination changes. Some ideas for overcoming the illumination and contrast variations must usually be tuned manually and tend to fail when tested with other types of data degradation. In this context, a new bio-inspired entry layer is presented in this work, M6, which detects low-level geometric features (lines, edges, and orientations) similar to those patterns detected by the V1 visual cortex. This new trainable layer is capable of dealing with image classification tasks even with large contrast variations. The explanation for this behavior is due to the use of monogenic signal geometry, which represents each pixel value in a 3D space using quaternions, a fact that confers a degree of explainability to the networks. The M6 was compared to conventional convolutional layer (C) and a deterministic quaternion local phase layer (Q9). The experimental setup is designed to evaluate the robustness of this M6 enriched ConvNet model and includes three architectures, four datasets, and three types of contrast degradation (including non-uniform haze degradations). The numerical results reveal that the models with M6 are the most robust in front of any kind of contrast variations. This amounts to a significant enhancement of the C models, which usually have reasonably good performance only when the same training and test degradation are used, except for the case of maximum degradation. Moreover, the Structural Similarity Index Measure (SSIM) and Peak Signal to Noise Ratio (PSNR) are used to analyze and explain the robustness effect of the M6 feature maps under any kind of contrast degradations.

[1]  Yang Gao,et al.  CW-SSIM based image classification , 2011, 2011 18th IEEE International Conference on Image Processing.

[2]  Chun-I Yeh,et al.  Cortical brightness adaptation when darkness and brightness produce different dynamical states in the visual cortex , 2014, Proceedings of the National Academy of Sciences.

[3]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[4]  Toby P. Breckon,et al.  On the Impact of Illumination-Invariant Image Pre-transformation for Contemporary Automotive Semantic Scene Understanding , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[5]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Christopher Ré,et al.  Learning to Compose Domain-Specific Transformations for Data Augmentation , 2017, NIPS.

[8]  Max Welling,et al.  Group Equivariant Convolutional Networks , 2016, ICML.

[9]  Smith,et al.  Mathematics of the Discrete Fourier Transform (DFT) with Audio Applications , 2007 .

[10]  Gabriel Cristóbal,et al.  Texture Image Retrieval Based on Log-Gabor Features , 2012, CIARP.

[11]  Zhendong Niu,et al.  CNN with depthwise separable convolutions and combined kernels for rating prediction , 2021, Expert Syst. Appl..

[12]  Mohinder Malhotra Single Image Haze Removal Using Dark Channel Prior , 2016 .

[13]  Edmund T. Rolls,et al.  Invariant visual object recognition: A model, with lighting invariance , 2006, Journal of Physiology-Paris.

[14]  Lina J. Karam,et al.  Understanding how image quality affects deep neural networks , 2016, 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX).

[15]  A. Cantor Optics of the atmosphere--Scattering by molecules and particles , 1978, IEEE Journal of Quantum Electronics.

[16]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Eduardo Vázquez-Santacruz,et al.  A Geometric Bio-inspired Model for Recognition of Low-Level Structures , 2011, ICANN.

[18]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[19]  J. Daugman Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[20]  Joel Z. Leibo,et al.  The Invariance Hypothesis Implies Domain-Specific Regions in Visual Cortex , 2014, bioRxiv.

[21]  Mahdi Rad,et al.  ALCN: Adaptive Local Contrast Normalization , 2020, Comput. Vis. Image Underst..

[22]  Peter König,et al.  Data augmentation instead of explicit regularization , 2018, ArXiv.

[23]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[24]  Michael Felsberg,et al.  The monogenic signal , 2001, IEEE Trans. Signal Process..

[25]  Jon Howell,et al.  Asirra: a CAPTCHA that exploits interest-aligned manual image categorization , 2007, CCS '07.

[26]  Martin Schrimpf,et al.  Simulating a Primary Visual Cortex at the Front of CNNs Improves Robustness to Image Perturbations , 2020, bioRxiv.

[27]  Ulises Cortés,et al.  A bio-inspired quaternion local phase CNN layer with contrast invariance and linear sensitivity to rotation angles , 2020, Pattern Recognit. Lett..

[28]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[29]  Jiaolong Yang,et al.  A Generic Deep Architecture for Single Image Reflection Removal and Image Smoothing (Supplementary Material) , 2017 .

[30]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[31]  Julian J. McAuley,et al.  Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering , 2016, WWW.

[32]  Sebastià Xambó-Descamps,et al.  Real Spinorial Groups: A Short Mathematical Introduction , 2018 .

[33]  E. Bayro-Corrochano,et al.  Symmetry Feature Extraction Based on Quaternionic Local Phase , 2014 .

[34]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[35]  S. Kay Maximum entropy spectral estimation using the analytical signal , 1978 .

[36]  Hans Knutsson,et al.  Signal processing for computer vision , 1994 .

[37]  Samuel Ortega,et al.  Hyperspectral imaging for head and neck cancer detection: specular glare and variance of the tumor margin in surgical specimens , 2019, Journal of medical imaging.

[38]  Eduardo Bayro-Corrochano,et al.  Geometric Bioinspired Networks for Recognition of 2-D and 3-D Low-Level Structures and Transformations , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[39]  Stephan J. Garbin,et al.  Harmonic Networks: Deep Translation and Rotation Equivariance , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Tri Dao,et al.  A Kernel Theory of Modern Data Augmentation , 2018, ICML.

[41]  Jinxiang Wang,et al.  Efficient road specular reflection removal based on gradient properties , 2018, Multimedia Tools and Applications.

[42]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[43]  Ulises Cortés,et al.  A Quaternion Deterministic Monogenic CNN Layer for Contrast Invariance , 2021, Systems, Patterns and Data Engineering with Geometric Calculi.

[44]  Xianghua Xie,et al.  Learnable Gabor modulated complex-valued networks for orientation robustness , 2020, ArXiv.

[45]  Alvy Ray Smith,et al.  Color gamut transform pairs , 1978, SIGGRAPH.

[46]  Michael Brady,et al.  On the Choice of Band-Pass Quadrature Filters , 2004, Journal of Mathematical Imaging and Vision.