Auto-Differentiating Linear Algebra

Development systems for deep learning, such as Theano, Torch, TensorFlow, or MXNet, are easy-to-use tools for creating complex neural network models. Since gradient computations are automatically baked in, and execution is mapped to high performance hardware, these models can be trained end-to-end on large amounts of data. However, it is currently not easy to implement many basic machine learning primitives in these systems (such as Gaussian processes, least squares estimation, principal components analysis, Kalman smoothing), mainly because they lack efficient support of linear algebra primitives as differentiable operators. We detail how a number of matrix decompositions (Cholesky, LQ, symmetric eigen) can be implemented as differentiable operators. We have implemented these primitives in MXNet, running on CPU and GPU in single and double precision. We sketch use cases of these new operators, learning Gaussian process and Bayesian linear regression models. Our implementation is based on BLAS/LAPACK APIs, for which highly tuned implementations are available on all major CPUs and GPUs.

[1]  Editors , 1986, Brain Research Bulletin.

[2]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[3]  M. Giles Collected Matrix Derivative Results for Forward and Reverse Mode Algorithmic Differentiation , 2008 .

[4]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[5]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[6]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[7]  David Barber,et al.  Bayesian reasoning and machine learning , 2012 .

[8]  René Lamour,et al.  On evaluating higher-order derivatives of the QR decomposition of tall matrices with full column rank in forward and reverse mode algorithmic differentiation , 2012, Optim. Methods Softw..

[9]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[10]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[11]  Yoshua Bengio,et al.  Deep Generative Stochastic Networks Trainable by Backprop , 2013, ICML.

[12]  Daan Wierstra,et al.  Stochastic Back-propagation and Variational Inference in Deep Latent Gaussian Models , 2014, ArXiv.

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Richard S. Zemel,et al.  Generative Moment Matching Networks , 2015, ICML.

[15]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[16]  Matthias W. Seeger,et al.  Bayesian Intermittent Demand Forecasting for Large Inventories , 2016, NIPS.

[17]  Iain Murray,et al.  Differentiation of the Cholesky decomposition , 2016, ArXiv.

[18]  Neil D. Lawrence,et al.  Variationally Auto-Encoded Deep G aussian Processes , 2016, International Conference on Learning Representations.

[19]  Neil D. Lawrence,et al.  Variational Auto-encoded Deep Gaussian Processes , 2015, ICLR.

[20]  C. Archambeau,et al.  Multiple Adaptive Bayesian Linear Regression for Scalable Bayesian Optimization with Warm Start , 2017, 1712.02902.

[21]  Alexis Boukouvalas,et al.  GPflow: A Gaussian Process Library using TensorFlow , 2016, J. Mach. Learn. Res..