论文信息 - BackPACK: Packing more into backprop

BackPACK: Packing more into backprop

Automatic differentiation frameworks are optimized for exactly one thing: computing the average mini-batch gradient. Yet, other quantities such as the variance of the mini-batch gradients or many approximations to the Hessian can, in theory, be computed efficiently, and at the same time as the gradient. While these quantities are of great interest to researchers and practitioners, current deep learning software does not support their automatic calculation. Manually implementing them is burdensome, inefficient if done naively, and the resulting code is rarely shared. This hampers progress in deep learning, and unnecessarily narrows research to focus on gradient descent and its variants; it also complicates replication studies and comparisons between newly developed methods that require those quantities, to the point of impossibility. To address this problem, we introduce BackPACK, an efficient framework built on top of PyTorch, that extends the backpropagation algorithm to extract additional information from first-and second-order derivatives. Its capabilities are illustrated by benchmark reports for computing additional quantities on deep neural networks, and an example application by testing several recent curvature approximations for optimization.

Philipp Hennig | Frederik Kunstner | Felix Dangel

[1] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[2] Rio Yokota,et al. Performance Optimizations and Analysis of Distributed Deep Learning with Approximated Second-Order Optimization Method , 2019, ICPP Workshops.

[3] Frederik Kunstner,et al. Limitations of the Empirical Fisher Approximation , 2019, NeurIPS.

[4] Frank Schneider,et al. DeepOBS: A Deep Learning Optimizer Benchmark Suite , 2019, ICLR.

[5] Philipp Hennig,et al. A Modular Approach to Block-diagonal Hessian Approximations for Second-order Optimization Methods , 2019, ArXiv.

[6] Michael Innes,et al. Don't Unroll Adjoint: Differentiating SSA-Form Programs , 2018, ArXiv.

[7] Pascal Vincent,et al. Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis , 2018, NeurIPS.

[8] Mike Innes,et al. Flux: Elegant machine learning with Julia , 2018, J. Open Source Softw..

[9] François Fleuret,et al. Not All Samples Are Created Equal: Deep Learning with Importance Sampling , 2018, ICML.

[10] Jimmy Ba,et al. Kronecker-factored Curvature Approximations for Recurrent Neural Networks , 2018, ICLR.

[11] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .