论文信息 - Input Switched Affine Networks: An RNN Architecture Designed for Interpretability

Input Switched Affine Networks: An RNN Architecture Designed for Interpretability

There exist many problem domains where the interpretability of neural network models is essential for deployment. Here we introduce a recurrent architecture composed of input-switched affine transformations - in other words an RNN without any explicit nonlinearities, but with input-dependent recurrent weights. This simple form allows the RNN to be analyzed via straightforward linear methods: we can exactly characterize the linear contribution of each input to the model predictions; we can use a change-of-basis to disentangle input, output, and computational hidden unit subspaces; we can fully reverse-engineer the architecture's solution to a simple task. Despite this ease of interpretation, the input switched affine network achieves reasonable performance on a text modeling tasks, and allows greater computational efficiency than networks with standard nonlinearities.

[1] David A. Freedman,et al. Statistical Models: Theory and Practice: References , 2005 .

[2] Arthur Szlam,et al. Automatic Rule Extraction from Long Short Term Memory Networks , 2016, ICLR.

[3] Yoshua Bengio,et al. Understanding intermediate layers using linear classifier probes , 2016, ICLR.

[4] Jascha Sohl-Dickstein,et al. Capacity and Trainability in Recurrent Neural Networks , 2016, ICLR.

[5] Sham M. Kakade,et al. A Linear Dynamical System Model for Text , 2015, ICML.

[6] Nello Cristianini,et al. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[7] M. Kearns,et al. Fairness in Criminal Justice Risk Assessments: The State of the Art , 2017, Sociological Methods & Research.

[8] David Sussillo,et al. Opening the Black Box: Low-Dimensional Dynamics in High-Dimensional Recurrent Neural Networks , 2013, Neural Computation.

[9] Geoffrey E. Hinton,et al. A Simple Way to Initialize Recurrent Networks of Rectified Linear Units , 2015, ArXiv.

[10] Alexander Mordvintsev,et al. Inceptionism: Going Deeper into Neural Networks , 2015 .

[11] Thorsten Brants,et al. One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[12] Wonyong Sung,et al. Character-level language modeling with hierarchical recurrent neural networks , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13] W. Newsome,et al. Context-dependent computation by recurrent dynamics in prefrontal cortex , 2013, Nature.

[14] Geoffrey E. Hinton,et al. Generating Text with Recurrent Neural Networks , 2011, ICML.

[15] Yonghui Wu,et al. Exploring the Limits of Language Modeling , 2016, ArXiv.

[16] Mark John Somers,et al. Neural Networks in Organizational Research: Applying Pattern Recognition to the Analysis of Organizational Behavior , 2006 .

[17] Anne E Carpenter,et al. Opportunities and obstacles for deep learning in biology and medicine , 2017, bioRxiv.

[18] Ilya Sutskever,et al. SUBWORD LANGUAGE MODELING WITH NEURAL NETWORKS , 2011 .

[19] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[20] Jürgen Schmidhuber,et al. LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[21] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[22] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23] Agustí Verde Parera,et al. General data protection regulation , 2018 .

[24] J. Ross Quinlan,et al. Simplifying decision trees , 1987, Int. J. Hum. Comput. Stud..

[25] Ilya Sutskever,et al. Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[26] Mykel J. Kochenderfer,et al. Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[27] Scott W. Linderman,et al. Recurrent switching linear dynamical systems , 2016, 1610.08466.

[28] Xin Zhang,et al. End to End Learning for Self-Driving Cars , 2016, ArXiv.

[29] Graham W. Taylor,et al. Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30] Marc'Aurelio Ranzato,et al. Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[31] Subhashini Venugopalan,et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. , 2016, JAMA.

[32] Pierluigi Siano,et al. Real Time Operation of Smart Grids via FCN Networks and Optimal Power Flow , 2012, IEEE Transactions on Industrial Informatics.

[33] K. Borgwardt,et al. Machine Learning in Medicine , 2015, Mach. Learn. under Resour. Constraints Vol. 3.

[34] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[35] Herbert Jaeger,et al. Observable Operator Models for Discrete Stochastic Time Series , 2000, Neural Computation.

[36] Fei-Fei Li,et al. Visualizing and Understanding Recurrent Networks , 2015, ArXiv.