论文信息 - Enhancing Neural Rendering Methods with Image Augmentations

Enhancing Neural Rendering Methods with Image Augmentations

Faithfully reconstructing 3D geometry and generating novel views of scenes are critical tasks in 3D computer vision. Despite the widespread use of image augmentations across computer vision applications, their potential remains underexplored when learning neural rendering methods (NRMs) for 3D scenes. This paper presents a comprehensive analysis of the use of image augmentations in NRMs, where we explore different augmentation strategies. We found that introducing image augmentations during training presents challenges such as geometric and photometric inconsistencies for learning NRMs from images. Specifically, geometric inconsistencies arise from alterations in shapes, positions, and orientations from the augmentations, disrupting spatial cues necessary for accurate 3D reconstruction. On the other hand, photometric inconsistencies arise from changes in pixel intensities introduced by the augmentations, affecting the ability to capture the underlying 3D structures of the scene. We alleviate these issues by focusing on color manipulations and introducing learnable appearance embeddings that allow NRMs to explain away photometric variations. Our experiments demonstrate the benefits of incorporating augmentations when learning NRMs, including improved photometric quality and surface reconstruction, as well as enhanced robustness against data quality issues, such as reduced training data and image degradations.

Juan C. P'erez | Bernard Ghanem | Sara Rojas | Jesus Zarzar

[1] Angjoo Kanazawa,et al. NerfAcc: Efficient Sampling Accelerates NeRFs , 2023, ArXiv.

[2] Ali K. Thabet,et al. VisCo Grids: Surface Reconstruction with Viscosity and Coarea Grids , 2023, NeurIPS.

[3] Khaled Alomar,et al. Data Augmentation in Classification and Segmentation: A Survey and New Strategies , 2023, J. Imaging.

[4] Cihang Xie,et al. Benchmarking Robustness in Neural Radiance Fields , 2023, ArXiv.

[5] Ben Poole,et al. DreamFusion: Text-to-3D using 2D Diffusion , 2022, ICLR.

[6] T. Müller,et al. Instant neural graphics primitives with a multiresolution hash encoding , 2022, ACM Trans. Graph..

[7] P. Frossard,et al. A Structured Dictionary Perspective on Implicit Neural Representations , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Jonathan T. Barron,et al. NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Bernard Ghanem,et al. Enhancing Adversarial Robustness via Test-time Transformation Ensembling , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[10] J.-Y. Zhu,et al. Advances in Neural Rendering , 2021, SIGGRAPH Courses.

[11] Yaron Lipman,et al. Volume Rendering of Neural Implicit Surfaces , 2021, NeurIPS.

[12] C. Theobalt,et al. NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction , 2021, NeurIPS.

[13] Jonathan T. Barron,et al. NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Jonathan T. Barron,et al. Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains , 2020, NeurIPS.

[15] Gordon Wetzstein,et al. State of the Art on Neural Rendering , 2020, Comput. Graph. Forum.

[16] Ronen Basri,et al. Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance , 2020, NeurIPS.

[17] Pratul P. Srinivasan,et al. NeRF , 2020, ECCV.

[18] P. Horváth,et al. Test-time augmentation for deep learning-based cell segmentation on microscopy images , 2020, Scientific reports.

[19] Tero Karras,et al. Analyzing and Improving the Image Quality of StyleGAN , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[21] Taesung Park,et al. Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Richard A. Newcombe,et al. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Sebastian Nowozin,et al. Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Alexei A. Efros,et al. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25] Yi Li,et al. Fully Convolutional Instance-Aware Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[29] Henrik Aanæs,et al. Large Scale Multi-view Stereopsis Evaluation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31] Richard Szeliski,et al. Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[32] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[33] Steven M. Seitz,et al. Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[34] Eero P. Simoncelli,et al. Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[35] Hans-Peter Seidel,et al. Free-viewpoint video of human actors , 2003, ACM Trans. Graph..