论文信息 - Learning Invariance with Compact Transforms

Learning Invariance with Compact Transforms

The problem of building machine learning models that admit efficient representations and also capture an appropriate inductive bias for the domain has recently attracted significant interest. Existing work for compressing deep learning pipelines has explored classes of structured matrices that exhibit forms of shift-invariance akin to convolutions. We leverage the displacement rank framework to automatically learn the structured class, allowing for adaptation to the invariances required for a given dataset while preserving asymptotically efficient multiplication and storage. In a setting with a small fixed parameter budget, our broad classes of structured matrices improve final accuracy by 5–7% on standard image classification datasets compared to conventional parameter constraining methods.

Atri Rudra | Tri Dao | Christopher Ré | Albert Gu | Anna T. Thomas

[1] Misha Denil,et al. ACDC: A Structured Efficient Linear Layer , 2015, ICLR.

[2] Yanzhi Wang,et al. Theoretical Properties for Neural Networks with Weight Matrices of Low Displacement Rank , 2017, ICML.

[3] Le Song,et al. Deep Fried Convnets , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[4] Victor Y. Pan,et al. c ○ 2003 Society for Industrial and Applied Mathematics INVERSION OF DISPLACEMENT OPERATORS ∗ , 2022 .

[5] Andrea Vedaldi,et al. Deep Image Prior , 2017, International Journal of Computer Vision.

[6] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[7] Luca Maria Gambardella,et al. Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Flexible, High Performance Convolutional Neural Networks for Image Classification , 2022 .

[8] Tara N. Sainath,et al. Structured Transforms for Small-Footprint Deep Learning , 2015, NIPS.

[9] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[10] M. Morf,et al. Displacement ranks of matrices and linear equations , 1979 .

[11] V. Pan. Structured Matrices and Polynomials: Unified Superfast Algorithms , 2001 .

[12] Yoshua Bengio,et al. An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[13] Atri Rudra,et al. A Two-pronged Progress in Structured Dense Matrix Vector Multiplication , 2018, SODA.

[14] Shih-Fu Chang,et al. An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15] Chao Wang,et al. CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[16] Yixin Chen,et al. Compressing Neural Networks with the Hashing Trick , 2015, ICML.