论文信息 - Deep Declarative Networks

Deep Declarative Networks

We explore a class of end-to-end learnable models wherein data processing nodes (or network layers) are defined in terms of desired behavior rather than an explicit forward function. Specifically, the forward function is implicitly defined as the solution to a mathematical optimization problem. Consistent with nomenclature in the programming languages community, we name these models deep declarative networks. Importantly, it can be shown that the class of deep declarative networks subsumes current deep learning models. Moreover, invoking the implicit function theorem, we show how gradients can be back-propagated through many declaratively defined data processing nodes thereby enabling end-to-end learning. We discuss how these declarative processing nodes can be implemented in the popular PyTorch deep learning software library allowing declarative and imperative nodes to co-exist within the same network. We also provide numerous insights and illustrative examples of declarative nodes and demonstrate their application for image and point cloud classification tasks.

[1] Thomas Brox,et al. Bilevel Optimization with Nonsmooth Lower Level Problems , 2015, SSVM.

[2] Jianxiong Xiao,et al. 3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Tinne Tuytelaars,et al. Modeling video evolution for action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Robert C. Bolles,et al. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[5] Yoram Singer,et al. Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[6] Subhransu Maji,et al. Meta-Learning With Differentiable Convex Optimization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7] J. Dennis,et al. Techniques for nonlinear least squares and robust regression , 1978 .

[8] Marshall F. Tappen,et al. Learning optimized MAP estimates in continuously-valued MRF models , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[10] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[11] R. Rockafellar,et al. Implicit Functions and Solution Mappings: A View from Variational Analysis , 2009 .

[12] F. Clarke. Generalized gradients and applications , 1975 .

[13] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[14] Basura Fernando,et al. Discriminatively Learned Hierarchical Rank Pooling Networks , 2017, International Journal of Computer Vision.

[15] Robert H. Halstead,et al. Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[16] J. Zico Kolter,et al. OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.

[17] Anoop Cherian,et al. On Differentiating Parameterized Argmin and Argmax Problems with Application to Bi-level Optimization , 2016, ArXiv.

[18] Jonathan F. Bard,et al. Practical Bilevel Optimization: Algorithms and Applications , 1998 .

[19] Bernhard P. Wrobel,et al. Multiple View Geometry in Computer Vision , 2001 .

[20] Dimitri P. Bertsekas,et al. Constrained Optimization and Lagrange Multiplier Methods , 1982 .

[21] Trang Nguyen,et al. Generalized Max Pooling for Action Recognition , 2015, 2015 Seventh International Conference on Knowledge and Systems Engineering (KSE).

[22] S. Dempe,et al. On the solution of convex bilevel optimization problems , 2015, Computational Optimization and Applications.

[23] Samet Oymak,et al. Learning Compact Neural Networks with Regularization , 2018, ICML.

[24] K. Schittkowski,et al. NONLINEAR PROGRAMMING , 2022 .

[25] Giovanni Mingari Scarpello,et al. A historical outline of the theorem of implicit functions. , 2002 .

[26] Thomas Pock,et al. Continuous Hyper-parameter Learning for Support Vector Machines , 2015 .

[27] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[28] Priya L. Donti,et al. SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver , 2019, ICML.

[29] Andreas Krause,et al. Differentiable Learning of Submodular Models , 2017, NIPS 2017.

[30] Leonidas J. Guibas,et al. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Heinrich von Stackelberg. Market Structure and Equilibrium , 2010 .

[32] J. Meyer. Generalized Inversion of Modified Matrices , 1973 .

[33] Pascal Fua,et al. Imposing Hard Constraints on Deep Networks: Promises and Limitations , 2017, CVPR 2017.

[34] David Duvenaud,et al. Neural Ordinary Differential Equations , 2018, NeurIPS.

[35] Stephen P. Boyd,et al. Differentiable Convex Optimization Layers , 2019, NeurIPS.

[36] Andreas Krause,et al. Differentiable Submodular Maximization , 2018, IJCAI.

[37] Justin Domke,et al. Generic Methods for Optimization-Based Modeling , 2012, AISTATS.

[38] Vladlen Koltun,et al. Deep Equilibrium Models , 2019, NeurIPS.

[39] Joshua B. Tenenbaum,et al. End-to-End Differentiable Physics for Learning and Control , 2018, NeurIPS.

[40] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[41] Chuan-Sheng Foo,et al. Efficient multiple hyperparameter learning for log-linear models , 2007, NIPS.

[42] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Stephen P. Boyd,et al. Differentiating through a cone program , 2019, Journal of Applied and Numerical Optimization.

[44] Byron Boots,et al. Differentiable MPC for End-to-end Planning and Control , 2018, NeurIPS.

[45] Basura Fernando,et al. Learning End-to-end Video Classification with Rank-Pooling , 2016, ICML.

[46] Anoop Cherian,et al. Visual Permutation Learning , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.