Convex Neural Autoregressive Models: Towards Tractable, Expressive, and Theoretically-Backed Models for Sequential Forecasting and Generation

Three features are crucial for sequential forecasting and generation models: tractability, expressiveness, and theoretical backing. While neural autoregressive models are relatively tractable and offer powerful predictive and generative capabilities, they often have complex optimization landscapes, and their theoretical properties are not well understood. To address these issues, we present convex formulations of autoregressive models with one hidden layer. Specifically, we prove an exact equivalence between these models and constrained, regularized logistic regression by using semi-infinite duality to embed the data matrix onto a higher dimensional space and introducing inequality constraints. To make this formulation tractable, we approximate the constraints using a hinge loss or drop them altogether. Furthermore, we demonstrate faster training and competitive performance of these implementations compared to their neural network counterparts on a variety of data sets. Consequently, we introduce techniques to derive tractable, expressive, and theoretically-interpretable models that are nearly equivalent to neural autoregressive models.

[1]  Mert Pilanci,et al.  Adaptive and Oblivious Randomized Subspace Methods for High-Dimensional Optimization: Sharp Analysis and Lower Bounds , 2020, IEEE Transactions on Information Theory.

[2]  Mert Pilanci,et al.  Neural Spectrahedra and Semidefinite Lifts: Global Convex Optimization of Polynomial Activation Neural Networks in Fully Polynomial-Time , 2021, ArXiv.

[3]  Mert Pilanci,et al.  Vector-output ReLU Neural Network Problems are Copositive Programs: Convex Analysis of Two Layer Networks and Polynomial-time Algorithms , 2020, ICLR.

[4]  Morteza Mardani,et al.  Convex Regularization Behind Neural Reconstruction , 2020, ICLR.

[5]  Mert Pilanci,et al.  Implicit Convex Regularizers of CNN Architectures: Convex Optimization of Two- and Three-Layer Networks in Polynomial Time , 2020, ICLR.

[6]  Mert Pilanci,et al.  M-IHS: An Accelerated Randomized Preconditioning Method Avoiding Costly Matrix Decompositions , 2020, Linear Algebra and its Applications.

[7]  Mark Chen,et al.  Generative Pretraining From Pixels , 2020, ICML.

[8]  Mert Pilanci,et al.  All Local Minima are Global for Two-Layer ReLU Neural Networks: The Hidden Convex Optimization Landscape , 2020, ArXiv.

[9]  Mert Pilanci,et al.  Effective Dimension Adaptive Sketching Methods for Faster Regularized Least-Squares Optimization , 2020, NeurIPS.

[10]  Mert Pilanci,et al.  Neural Networks are Convex Regularizers: Exact Polynomial-time Convex Optimization Formulations for Two-Layer Networks , 2020, ICML.

[11]  Mert Pilanci,et al.  Optimal Randomized First-Order Methods for Least-Squares Problems , 2020, ICML.

[12]  Mert Pilanci,et al.  Optimal Iterative Sketching with the Subsampled Randomized Hadamard Transform , 2020 .

[13]  Faster Least Squares Optimization , 2019, ArXiv.

[14]  Marco Pavone,et al.  High-Dimensional Optimization in Adaptive Random Subspaces , 2019, NeurIPS.

[15]  Mert Pilanci,et al.  Iterative Hessian Sketch with Momentum , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Arthur Jacot,et al.  Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.

[17]  Andrea Montanari,et al.  A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.

[18]  Tengyu Ma,et al.  Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.

[19]  Anima Anandkumar,et al.  Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods , 2017 .

[20]  Amir Globerson,et al.  Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.

[21]  Hugo Larochelle,et al.  Neural Autoregressive Distribution Estimation , 2016, J. Mach. Learn. Res..

[22]  Anima Anandkumar,et al.  Provable Methods for Training Neural Networks with Sparse Connectivity , 2014, ICLR.

[23]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[24]  Daan Wierstra,et al.  Deep AutoRegressive Networks , 2013, ICML.

[25]  Aditya Bhaskara,et al.  Provable Bounds for Learning Some Deep Representations , 2013, ICML.

[26]  Ruslan Salakhutdinov,et al.  On the quantitative analysis of deep belief networks , 2008, ICML '08.

[27]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[28]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.