Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery

Recurrent neural networks (RNNs) are powerful and effective for processing sequential data. However, RNNs are usually considered "black box" models whose internal structure and learned parameters are not interpretable. In this paper, we propose an interpretable RNN based on the sequential iterative soft-thresholding algorithm (SISTA) for solving the sequential sparse recovery problem, which models a sequence of correlated observations with a sequence of sparse latent vectors. The architecture of the resulting SISTA-RNN is implicitly defined by the computational structure of SISTA, which results in a novel stacked RNN architecture. Furthermore, the weights of the SISTA-RNN are perfectly interpretable as the parameters of a principled statistical model, which in this case include a sparsifying dictionary, iterative step size, and regularization parameters. In addition, on a particular sequential compressive sensing task, the SISTA-RNN trains faster and achieves better performance than conventional state-of-the-art black box RNNs, including long-short term memory (LSTM) RNNs.

[1]  Stephen J. Wright,et al.  Sparse Reconstruction by Separable Approximation , 2008, IEEE Transactions on Signal Processing.

[2]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[3]  Yann LeCun,et al.  Discriminative Recurrent Sparse Auto-Encoders , 2013, ICLR.

[4]  Justin K. Romberg,et al.  Sparse Recovery of Streaming Signals Using $\ell_1$-Homotopy , 2013, IEEE Transactions on Signal Processing.

[5]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[6]  Les E. Atlas,et al.  Full-Capacity Unitary Recurrent Neural Networks , 2016, NIPS.

[7]  Yoshua Bengio,et al.  Unitary Evolution Recurrent Neural Networks , 2015, ICML.

[8]  Hassan Mansour,et al.  Learning Optimal Nonlinearities for Iterative Thresholding Algorithms , 2015, IEEE Signal Processing Letters.

[9]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[10]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[11]  Antonin Chambolle,et al.  Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage , 1998, IEEE Trans. Image Process..

[12]  Razvan Pascanu,et al.  How to Construct Deep Recurrent Neural Networks , 2013, ICLR.

[13]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[14]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[17]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[18]  J. Borwein,et al.  Two-Point Step Size Gradient Methods , 1988 .

[19]  Finale Doshi-Velez,et al.  Increasing the Interpretability of Recurrent Neural Networks Using Hidden Markov Models , 2016, ArXiv.

[20]  Robert D. Nowak,et al.  An EM algorithm for wavelet-based image restoration , 2003, IEEE Trans. Image Process..

[21]  Jonathan Le Roux,et al.  Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures , 2014, ArXiv.

[22]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[23]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.