论文信息 - Prior Activation Distribution (PAD): A Versatile Representation to Utilize DNN Hidden Units

Prior Activation Distribution (PAD): A Versatile Representation to Utilize DNN Hidden Units

In this paper, we introduce the concept of Prior Activation Distribution (PAD) as a versatile and general technique to capture the typical activation patterns of hidden layer units of a Deep Neural Network used for classification tasks. We show that the combined neural activations of such a hidden layer have class-specific distributional properties, and then define multiple statistical measures to compute how far a test sample's activations deviate from such distributions. Using a variety of benchmark datasets (including MNIST, CIFAR10, Fashion-MNIST & notMNIST), we show how such PAD-based measures can be used, independent of any training technique, to (a) derive fine-grained uncertainty estimates for inferences; (b) provide inferencing accuracy competitive with alternatives that require execution of the full pipeline, and (c) reliably isolate out-of-distribution test samples.

[1] Charles Blundell,et al. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[2] Yoshua Bengio,et al. Understanding intermediate layers using linear classifier probes , 2016, ICLR.

[3] Edward H. Shortliffe,et al. The Dempster-Shafer theory of evidence , 1990 .

[4] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[6] Jason Yosinski,et al. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] M. Artés. Statistical errors. , 1977, Medicina clinica.

[8] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[9] Zoubin Ghahramani,et al. Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference , 2015, ArXiv.

[10] David J. C. MacKay,et al. A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[11] Murat Sensoy,et al. Evidential Deep Learning to Quantify Classification Uncertainty , 2018, NeurIPS.

[12] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[13] Christopher K. I. Williams. Computing with Infinite Networks , 1996, NIPS.

[14] Bram van Ginneken,et al. A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[15] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[16] Geoffrey E. Hinton,et al. Bayesian Learning for Neural Networks , 1995 .

[17] David Mackay,et al. Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[18] Bolei Zhou,et al. Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Stefan M. Herzog,et al. Experimental biology: Sometimes Bayesian statistics are better , 2013, Nature.

[20] John D. Hunter,et al. Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[21] Timo Aila,et al. Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[22] Jelena Frtunikj,et al. Deep Learning for Self-Driving Cars: Chances and Challenges , 2018, 2018 IEEE/ACM 1st International Workshop on Software Engineering for AI in Autonomous Systems (SEFAIAS).

[23] Kevin Smith,et al. Bayesian Uncertainty Estimation for Batch Normalized Deep Networks , 2018, ICML.

[24] Max Welling,et al. Multiplicative Normalizing Flows for Variational Bayesian Neural Networks , 2017, ICML.

[25] Abhinav Gupta,et al. Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[26] H. N. Io,et al. Chatbots and conversational agents: A bibliometric analysis , 2017, 2017 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM).

[27] Pradeep Ravikumar,et al. Representer Point Selection for Explaining Deep Neural Networks , 2018, NeurIPS.

[28] David Barber,et al. A Scalable Laplace Approximation for Neural Networks , 2018, ICLR.

[29] Albin Cassirer,et al. Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.

[30] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[31] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.

[32] Xiang Li,et al. Understanding the Disharmony Between Dropout and Batch Normalization by Variance Shift , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[34] Thomas Brox,et al. Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[35] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[36] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[37] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[38] Jascha Sohl-Dickstein,et al. SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability , 2017, NIPS.