Input Similarity from the Neural Network Perspective

Given a trained neural network, we aim at understanding how similar it considers any two samples. For this, we express a proper definition of similarity from the neural network perspective (i.e. we quantify how undissociable two inputs A and B are), by taking a machine learning viewpoint: how much a parameter variation designed to change the output for A would impact the output for B as well? We study the mathematical properties of this similarity measure, and show how to estimate sample density with it, in low complexity, enabling new types of statistical analysis for neural networks. We also propose to use it during training, to enforce that examples known to be similar should also be seen as similar by the network. We then study the self-denoising phenomenon encountered in regression tasks when training neural networks on datasets with noisy labels. We exhibit a multimodal image registration task where almost perfect accuracy is reached, far beyond label noise variance. Such an impressive self-denoising phenomenon can be explained as a noise averaging effect over the labels of similar examples. We analyze data by retrieving samples perceived as similar by the network, and are able to quantify the denoising effect without requiring true labels.

[1]  Leon A. Gatys,et al.  Texture Synthesis Using Convolutional Neural Networks , 2015, NIPS.

[2]  Max Welling,et al.  Group Equivariant Convolutional Networks , 2016, ICML.

[3]  Pierre Alliez,et al.  Can semantic labeling methods generalize to any city? the inria aerial image labeling benchmark , 2017, 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[4]  Geoffrey E. Hinton,et al.  Learning to Label Aerial Images from Noisy Data , 2012, ICML.

[5]  Yale Song,et al.  Learning from Noisy Labels with Distillation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Leslie M. Collins,et al.  Aerial imagery object identification dataset for building and road detection, and building height estimation , 2016 .

[7]  Leon A. Gatys,et al.  A Neural Algorithm of Artistic Style , 2015, ArXiv.

[8]  Arthur Jacot,et al.  Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.

[9]  Jürgen Schmidhuber,et al.  Simplifying Neural Nets by Discovering Flat Minima , 1994, NIPS.

[10]  Y. Le Cun,et al.  Double backpropagation increasing generalization performance , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[11]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[12]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[14]  Yann Ollivier,et al.  Riemannian metrics for neural networks I: feedforward networks , 2013, 1303.0818.

[15]  Yann Ollivier,et al.  Riemannian metrics for neural networks II: recurrent networks and learning symbolic data sequences , 2013, 1306.0514.

[16]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[17]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[18]  Guillaume Charpiat,et al.  Noisy Supervision for Correcting Misaligned Cadaster Maps Without Perfect Ground Truth Data , 2019, IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium.

[19]  Joan Bruna,et al.  Training Convolutional Networks with Noisy Labels , 2014, ICLR 2014.

[20]  Francis Bach,et al.  A Note on Lazy Training in Supervised Differentiable Programming , 2018, ArXiv.

[21]  Jaakko Lehtinen,et al.  Noise2Noise: Learning Image Restoration without Clean Data , 2018, ICML.