论文信息 - Spatial Functa: Scaling Functa to ImageNet Classification and Generation

Spatial Functa: Scaling Functa to ImageNet Classification and Generation

Neural fields, also known as implicit neural representations, have emerged as a powerful means to represent complex signals of various modalities. Based on this Dupont et al. (2022) introduce a framework that views neural fields as data, termed *functa*, and proposes to do deep learning directly on this dataset of neural fields. In this work, we show that the proposed framework faces limitations when scaling up to even moderately complex datasets such as CIFAR-10. We then propose *spatial functa*, which overcome these limitations by using spatially arranged latent representations of neural fields, thereby allowing us to scale up the approach to ImageNet-1k at 256x256 resolution. We demonstrate competitive performance to Vision Transformers (Steiner et al., 2022) on classification and Latent Diffusion (Rombach et al., 2022) on image generation respectively.

[1] Pierluigi Zama Ramirez,et al. Deep Learning on Implicit Neural Representations of Shapes , 2023, ICLR.

[2] Jiajun Wu,et al. 3D Neural Field Generation Using Triplane Diffusion , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Jinwoo Shin,et al. Scalable Neural Video Representations with Learnable Positional Features , 2022, NeurIPS.

[4] Walter A. Talbott,et al. GAUDI: A Neural Architect for Immersive 3D Scene Generation , 2022, NeurIPS.

[5] Jonathan Ho. Classifier-Free Diffusion Guidance , 2022, ArXiv.

[6] Y. Teh,et al. Meta-Learning Sparse Compression Networks , 2022, Trans. Mach. Learn. Res..

[7] Andreas Geiger,et al. TensoRF: Tensorial Radiance Fields , 2022, ECCV.

[8] Danilo Jimenez Rezende,et al. From data to functa: Your data point is a function and you can treat it like one , 2022, ICML.

[9] T. Müller,et al. Instant neural graphics primitives with a multiresolution hash encoding , 2022, ACM Trans. Graph..

[10] B. Ommer,et al. High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Benjamin Recht,et al. Plenoxels: Radiance Fields without Neural Networks , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] L. Gool,et al. Implicit Neural Representations for Image Compression , 2021, ECCV.

[13] Jakub M. Tomczak,et al. FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes , 2021, ICLR.

[14] J. Pauly,et al. NeRP: Implicit Neural Representation Learning With Prior Embedding for Sparsely Sampled Image Reconstruction , 2021, IEEE Transactions on Neural Networks and Learning Systems.

[15] Y. Teh,et al. Generative Models as Distributions of Functions , 2021, AISTATS.

[16] Jakub M. Tomczak,et al. CKConv: Continuous Kernel Convolution For Sequential Data , 2021, ICLR.

[17] Y. Teh,et al. COIN++: Data Agnostic Neural Compression , 2022, ArXiv.

[18] P. Golland,et al. Deep Learning on Implicit Neural Datasets , 2022, ArXiv.

[19] Abhinav Shrivastava,et al. NeRV: Neural Representations for Videos , 2021, NeurIPS.

[20] Jakob Uszkoreit,et al. How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers , 2021, Trans. Mach. Learn. Res..

[21] Prafulla Dhariwal,et al. Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[22] R. Ramamoorthi,et al. Modulated Periodic Activations for Generalizable Local Functional Representations , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[23] Yee Whye Teh,et al. COIN: COmpression with Implicit Neural representations , 2021, ICLR 2021.

[24] Prafulla Dhariwal,et al. Improved Denoising Diffusion Probabilistic Models , 2021, ICML.

[25] Xiaolong Wang,et al. Learning Continuous Image Representation with Local Implicit Image Function , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[27] Pieter Abbeel,et al. Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[28] Gordon Wetzstein,et al. Implicit Neural Representations with Periodic Activation Functions , 2020, NeurIPS.

[29] Richard A. Newcombe,et al. Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction , 2020, ECCV.

[30] Thomas Funkhouser,et al. Local Implicit Grid Representations for 3D Scenes , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Pratul P. Srinivasan,et al. NeRF , 2020, ECCV.

[32] Gerard Pons-Moll,et al. Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Tie-Yan Liu,et al. On Layer Normalization in the Transformer Architecture , 2020, ICML.

[34] Quoc V. Le,et al. Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35] Jingbo Zhu,et al. Learning Deep Transformer Models for Machine Translation , 2019, ACL.

[36] Richard A. Newcombe,et al. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Sebastian Nowozin,et al. Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Jaakko Lehtinen,et al. Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[39] Frank Hutter,et al. Fixing Weight Decay Regularization in Adam , 2017, ArXiv.

[40] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[41] David Ha,et al. Generating Large Images from Latent Vectors , 2016 .

[42] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Leonidas J. Guibas,et al. ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[44] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[45] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.