Tangent: Automatic differentiation using source-code transformation for dynamically typed array programming

The need to efficiently calculate first- and higher-order derivatives of increasingly complex models expressed in Python has stressed or exceeded the capabilities of available tools. In this work, we explore techniques from the field of automatic differentiation (AD) that can give researchers expressive power, performance and strong usability. These include source-code transformation (SCT), flexible gradient surgery, efficient in-place array operations, and higher-order derivatives. We implement and demonstrate these ideas in the Tangent software library for Python, the first AD framework for a dynamic language that uses SCT.

[1]  Mehdi Amini,et al.  Pythran: Enabling Static Optimization of Scientific Python Programs , 2013, SciPy.

[2]  Alex Graves,et al.  Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes , 2016, NIPS.

[3]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[4]  Barak A. Pearlmutter,et al.  Automatic differentiation in machine learning: a survey , 2015, J. Mach. Learn. Res..

[5]  Graham Neubig,et al.  Neural Lattice Language Models , 2018, TACL.

[6]  Arild Nøkland,et al.  Direct Feedback Alignment Provides Learning in Deep Neural Networks , 2016, NIPS.

[7]  Christian Bischof,et al.  Computing derivatives of computer programs , 2000 .

[8]  Yuval Tassa,et al.  Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.

[9]  Jing Peng,et al.  An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[10]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[11]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[12]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[13]  Barak A. Pearlmutter,et al.  Perturbation Confusion and Referential Transparency:Correct Functional Implementation of Forward-Mode AD , 2005 .

[14]  Robert E. Tarjan,et al.  Making data structures persistent , 1986, STOC '86.

[15]  Travis E. Oliphant,et al.  Guide to NumPy , 2015 .

[16]  Andreas Griewank,et al.  Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition , 2000, Frontiers in applied mathematics.

[17]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[18]  Uwe Naumann,et al.  The Art of Differentiating Computer Programs - An Introduction to Algorithmic Differentiation , 2012, Software, environments, tools.

[19]  Daniel Cownden,et al.  Random feedback weights support learning in deep neural networks , 2014, ArXiv.

[20]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[21]  Siu Kwan Lam,et al.  Numba: a LLVM-based Python JIT compiler , 2015, LLVM '15.

[22]  Kenta Oono,et al.  Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .

[23]  Haichen Shen,et al.  TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.

[24]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[25]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.