论文信息 - Provably Correct Automatic Subdifferentiation for Qualified Programs - 字舞流文

Provably Correct Automatic Subdifferentiation for Qualified Programs

The \emph{Cheap Gradient Principle}~\citep{Griewank:2008:EDP:1455489} --- the computational cost of computing a $d$-dimensional vector of partial derivatives of a scalar function is nearly the same (often within a factor of $5$) as that of simply computing the scalar function itself --- is of central importance in optimization; it allows us to quickly obtain (high-dimensional) gradients of scalar loss functions which are subsequently used in black box gradient-based optimization procedures. The current state of affairs is markedly different with regards to computing sub-derivatives: widely used ML libraries, including TensorFlow and PyTorch, do \emph{not} correctly compute (generalized) sub-derivatives even on simple differentiable examples. This work considers the question: is there a \emph{Cheap Sub-gradient Principle}? Our main result shows that, under certain restrictions on our library of non-smooth functions (standard in non-linear programming), provably correct generalized sub-derivatives can be computed at a computational cost that is within a (dimension-free) factor of $6$ of the cost of computing the scalar function itself.

Sham M. Kakade | Jason D. Lee | S. Kakade | J. Lee

[1] J. Abadie. ON THE KUHN-TUCKER THEOREM. , 1966 .

[2] F. J. Gould,et al. A NECESSARY AND SUFFICIENT QUALIFICATION FOR CONSTRAINED OPTIMIZATION , 1971 .

[3] D. W. Peterson. A REVIEW OF CONSTRAINT QUALIFICATIONS IN FINITE-DIMENSIONAL SPACES* , 1973 .

[4] F. Clarke. Generalized gradients and applications , 1975 .

[5] J. Morgenstern,et al. How to compute fast a function and all its derivatives: a variation on the theorem of Baur-strassen , 1985, SIGA.

[6] James L. McClelland,et al. Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[7] Stephen Smale,et al. On a theory of computation over the real numbers; NP completeness, recursive functions and universal machines , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[8] Griewank,et al. On automatic differentiation , 1988 .

[9] James Demmel,et al. Applied Numerical Linear Algebra , 1997 .

[10] Andreas Griewank,et al. Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition , 2000, Frontiers in applied mathematics.

[11] M. Coste. AN INTRODUCTION TO O-MINIMAL GEOMETRY , 2002 .

[12] O. Mangasarian. On Concepts of Directional Differentiability , 2004 .

[13] Yurii Nesterov,et al. Lexicographic differentiation of nonsmooth functions , 2005, Math. Program..

[14] B. Mordukhovich. Variational Analysis and Generalized Differentiation II: Applications , 2006 .

[15] Andreas Griewank,et al. Who Invented the Reverse Mode of Differentiation , 2012 .

[16] Paul I. Barton,et al. Evaluating an element of the Clarke generalized Jacobian of a composite piecewise differentiable function , 2013, TOMS.

[17] Andreas Griewank,et al. ON AUTOMATIC DIFFERENTIATION AND ALGORITHMIC LINEARIZATION , 2014 .

[18] Paul I. Barton,et al. A vector forward mode of automatic differentiation for generalized derivative evaluation , 2015, Optim. Methods Softw..

[19] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[20] Neil D. Lawrence,et al. Auto-Differentiating Linear Algebra , 2017, ArXiv.

[21] Stéphan Thomassé,et al. On the complexity of partial derivatives , 2016, STACS.

[22] Barak A. Pearlmutter,et al. Automatic differentiation in machine learning: a survey , 2015, J. Mach. Learn. Res..

[23] Andreas Griewank,et al. Algorithmic differentiation for piecewise smooth functions: a case study for robust optimization , 2018, Optim. Methods Softw..

[24] Kamil A. Khan. Branch-locking AD techniques for nonsmooth composite functions and nonsmooth implicit functions , 2018, Optim. Methods Softw..