From CNNs to Shift-Invariant Twin Models Based on Complex Wavelets

We propose a novel antialiasing method to increase shift invariance and prediction accuracy in convolutional neural networks. Specifically, we replace the first-layer combination"real-valued convolutions + max pooling"($\mathbb{R}$Max) by"complex-valued convolutions + modulus"($\mathbb{C}$Mod), which is stable to translations. To justify our approach, we claim that $\mathbb{C}$Mod and $\mathbb{R}$Max produce comparable outputs when the convolution kernel is band-pass and oriented (Gabor-like filter). In this context, $\mathbb{C}$Mod can be considered as a stable alternative to $\mathbb{R}$Max. Thus, prior to antialiasing, we force the convolution kernels to adopt such a Gabor-like structure. The corresponding architecture is called mathematical twin, because it employs a well-defined mathematical operator to mimic the behavior of the original, freely-trained model. Our antialiasing approach achieves superior accuracy on ImageNet and CIFAR-10 classification tasks, compared to prior methods based on low-pass filtering. Arguably, our approach's emphasis on retaining high-frequency details contributes to a better balance between shift invariance and information preservation, resulting in improved performance. Furthermore, it has a lower computational cost and memory footprint than concurrent work, making it a promising solution for practical implementation.

[1]  Alahari Karteek,et al.  On the Shift Invariance of Max Pooling Feature Maps in Convolutional Neural Networks , 2022, ArXiv.

[2]  H. Hasegawa,et al.  Complex-Valued Neural Networks: A Comprehensive Survey , 2022, IEEE/CAA Journal of Automatica Sinica.

[3]  I. Rish,et al.  Parametric Scattering Networks , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Yee Whye Teh,et al.  Group Equivariant Subsampling , 2021, NeurIPS.

[5]  Humphrey Shi,et al.  Escaping the Big Data Paradigm with Compact Transformers , 2021, ArXiv.

[6]  N. Codella,et al.  CvT: Introducing Convolutions to Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Fengwei Yu,et al.  Incorporating Convolution Designs into Visual Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Ivan Dokmanic,et al.  Truly shift-invariant convolutional neural networks , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[10]  Yong Jae Lee,et al.  Delving Deeper into Anti-Aliasing in ConvNets , 2020, International Journal of Computer Vision.

[11]  J. V. Gemert,et al.  On Translation Invariance in CNNs: Convolutional Layers Can Exploit Absolute Spatial Location , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Sen Jia,et al.  How Much Position Information Do Convolutional Neural Networks Encode? , 2020, ICLR.

[13]  S. Mallat,et al.  Deep Network classification by Scattering and Homotopy dictionary learning , 2019, ICLR.

[14]  Richard Zhang,et al.  Making Convolutional Networks Shift-Invariant Again , 2019, ICML.

[15]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2019, ICLR.

[16]  Nick G. Kingsbury,et al.  A Learnable Scatternet: Locally Invariant Convolutional Layers , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[17]  Sailesh Conjeti,et al.  Complex Fully Convolutional Neural Networks for MR Image Reconstruction , 2018, MLMIR@MICCAI.

[18]  Yair Weiss,et al.  Why do deep convolutional networks generalize so poorly to small image transformations? , 2018, J. Mach. Learn. Res..

[19]  Gilad Lerman,et al.  Graph Convolutional Neural Networks via Scattering , 2018, Applied and Computational Harmonic Analysis.

[20]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[21]  Haipeng Wang,et al.  Complex-Valued Convolutional Neural Network and Its Application in Polarimetric SAR Image Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[22]  Sandeep Subramanian,et al.  Deep Complex Networks , 2017, ICLR.

[23]  Edouard Oyallon,et al.  Scaling the Scattering Transform: Deep Hybrid Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Nick G. Kingsbury,et al.  Dual-Tree wavelet scattering network with parametric log transformation for object classification , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Thomas Wiatowski,et al.  A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction , 2015, IEEE Transactions on Information Theory.

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Mark Tygert,et al.  A Mathematical Motivation for Complex-Valued Convolutional Networks , 2015, Neural Computation.

[28]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[29]  Stéphane Mallat,et al.  Deep roto-translation scattering for object classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[31]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[32]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[33]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[34]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[35]  Stéphane Mallat,et al.  Invariant Scattering Convolution Networks , 2012, IEEE transactions on pattern analysis and machine intelligence.

[36]  Jieping Ye,et al.  Efficient L1/Lq Norm Regularization , 2010, ArXiv.

[37]  Ivan W. Selesnick,et al.  On the Dual-Tree Complex Wavelet Packet and $M$-Band Transforms , 2008, IEEE Transactions on Signal Processing.

[38]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[39]  Richard Baraniuk,et al.  The Dual-tree Complex Wavelet Transform , 2007 .

[40]  Nick G. Kingsbury,et al.  Design of Q-shift complex wavelets for image processing using frequency domain energy minimization , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[41]  C. Schnörr,et al.  Natural Image Statistics for Natural Image Segmentation , 2003, Int. J. Comput. Vis..

[42]  Julian Magarey,et al.  Wavelet Transforms in Image Processing , 1998 .

[43]  Alan C. Bovik,et al.  The analytic image , 1997, Proceedings of International Conference on Image Processing.

[44]  Sailesh Conjeti,et al.  Machine Learning for Medical Image Reconstruction , 2018, Lecture Notes in Computer Science.

[45]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[46]  Bernd Jähne,et al.  Practical handbook on image processing for scientific and technical applications , 2004 .

[47]  A. Krizhevsky ImageNet Classification with Deep Convolutional Neural Networks , 2022 .