Understanding Deep Architectures with Reasoning Layer

Recently, there has been a surge of interest in combining deep learning models with reasoning in order to handle more sophisticated learning tasks. In many cases, a reasoning task can be solved by an iterative algorithm. This algorithm is often unrolled, and used as a specialized layer in the deep architecture, which can be trained end-to-end with other neural components. Although such hybrid deep architectures have led to many empirical successes, the theoretical foundation of such architectures, especially the interplay between algorithm layers and other neural layers, remains largely unexplored. In this paper, we take an initial step towards an understanding of such hybrid deep architectures by showing that properties of the algorithm layers, such as convergence, stability, and sensitivity, are intimately related to the approximation and generalization abilities of the end-to-end model. Furthermore, our analysis matches closely our experimental observations under various conditions, suggesting that our theory can provide useful guidelines for designing deep architectures with reasoning layers.

[1]  Yoram Singer,et al.  Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[2]  Vladimir Koltchinskii,et al.  Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.

[3]  V. Koltchinskii Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.

[4]  Luc De Raedt,et al.  DeepProbLog: Neural Probabilistic Logic Programming , 2018, BNAIC/BENELEARN.

[5]  M. Talagrand Sharper Bounds for Gaussian and Empirical Processes , 1994 .

[6]  Bin Yu,et al.  Stability and Convergence Trade-off of Iterative Optimization Algorithms , 2018, ArXiv.

[7]  Jorge Nocedal,et al.  On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[8]  Jean-Philippe Vert,et al.  Differentiable Sorting using Optimal Transport: The Sinkhorn CDF and Quantile Operator , 2019, NeurIPS 2019.

[9]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[10]  Nathan Srebro,et al.  Exploring Generalization in Deep Learning , 2017, NIPS.

[11]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[12]  Milind Tambe,et al.  MIPaaL: Mixed Integer Program as a Layer , 2019, AAAI.

[13]  Leslie Pack Kaelbling,et al.  Generalization in Deep Learning , 2017, ArXiv.

[14]  Shivani Agarwal,et al.  Generalization Bounds for Ranking Algorithms via Algorithmic Stability , 2009, J. Mach. Learn. Res..

[15]  Justin Domke,et al.  Parameter learning with truncated message-passing , 2011, CVPR 2011.

[16]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[17]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[18]  Georg Martius,et al.  Differentiation of Blackbox Combinatorial Solvers , 2020, ICLR.

[19]  Francis Bach,et al.  Learning with Differentiable Perturbed Optimizers , 2020, ArXiv.

[20]  Debora S. Marks,et al.  Learning Protein Structure with a Differentiable Simulator , 2018, ICLR.

[21]  Milind Tambe,et al.  End to end learning and optimization on graphs , 2019, NeurIPS.

[22]  Le Song,et al.  RNA Secondary Structure Prediction By Learning Unrolled Algorithms , 2020, ICLR.

[23]  Xiaohan Chen,et al.  Theoretical Linear Convergence of Unfolded ISTA and its Practical Weights and Thresholds , 2018, NeurIPS.

[24]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[25]  Le Song,et al.  GLAD: Learning Sparse Graph Recovery , 2019, ICLR.

[26]  Andreas Maurer,et al.  A Vector-Contraction Inequality for Rademacher Complexities , 2016, ALT.

[27]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[28]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[29]  Vladlen Koltun,et al.  Deep Equilibrium Models , 2019, NeurIPS.

[30]  Priya L. Donti,et al.  Task-based End-to-end Model Learning in Stochastic Optimization , 2017, NIPS.

[31]  Arthur Mensch,et al.  Differentiable Dynamic Programming for Structured Prediction and Attention , 2018, ICML.

[32]  Thomas Pock,et al.  End-to-End Training of Hybrid CNN-CRF Models for Stereo , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Jean-Philippe Vert,et al.  Differentiable Ranking and Sorting using Optimal Transport , 2019, NeurIPS.

[34]  Laurent El Ghaoui,et al.  Implicit Deep Learning , 2019, SIAM J. Math. Data Sci..

[35]  Claire Cardie,et al.  SparseMAP: Differentiable Sparse Structured Inference , 2018, ICML.

[36]  J. Zico Kolter,et al.  OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.

[37]  V. Koltchinskii,et al.  Rademacher Processes and Bounding the Risk of Function Learning , 2004, math/0405338.

[38]  Zhi-Li Zhang,et al.  Stability and Generalization of Graph Convolutional Neural Networks , 2019, KDD.

[39]  Georg Martius,et al.  Deep Graph Matching via Blackbox Differentiation of Combinatorial Solvers , 2020, ECCV.

[40]  Jian Sun,et al.  ADMM-Net: A Deep Learning Approach for Compressive Sensing MRI , 2017, ArXiv.

[41]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Shiliang Sun,et al.  PAC-Bayes bounds for stable algorithms with instance-dependent priors , 2018, NeurIPS.

[43]  Mehryar Mohri,et al.  Structured Prediction Theory Based on Factor Graph Complexity , 2016, NIPS.

[44]  Andrew McCallum,et al.  End-to-End Learning for Structured Prediction Energy Networks , 2017, ICML.

[45]  Priya L. Donti,et al.  SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver , 2019, ICML.

[46]  Sergey Levine,et al.  Meta-Learning with Implicit Gradients , 2019, NeurIPS.

[47]  David A. McAllester,et al.  A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.

[48]  Tuo Zhao,et al.  On Generalization Bounds of a Family of Recurrent Neural Networks , 2018, AISTATS.

[49]  Luc Van Gool,et al.  RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[51]  Stefanie Jegelka,et al.  Generalization and Representational Limits of Graph Neural Networks , 2020, ICML.