Deriving Local Internal Logic for Black Box Models

Despite the widespread use, machine learning methods produce black box models. It is hard to understand how features influence the model prediction. We propose a novel explanation method that explains the predictions of any classifier by analyzing the prediction change obtained by omitting relevant subsets of attribute values. The local internal logic is captured by learning a local model in the neighborhood of the prediction to explain. The explanations provided by our method are effective in detecting associations among attributes and class label.