Granger-causal Attentive Mixtures of Experts

Several methods have recently been proposed to detect salient input features for outputs of neural networks. Those methods offer a qualitative glimpse at feature importance, but they fall short of providing quantifiable attributions that can be compared across decisions and measures of the expected quality of their explanations. To address these shortcomings, we present an attentive mixture of experts (AME) that couples attentive gating with a Granger-causal objective to jointly produce accurate predictions as well as measures of feature importance. We demonstrate the utility of AMEs by determining factors driving demand for medical prescriptions, comparing predictive features for Parkinson's disease and pinpointing discriminatory genes across cancer types.