Explainable AI: A Hybrid Approach to Generate Human-Interpretable Explanation for Deep Learning Prediction

Abstract With massive computing power and data explosion as catalysts, Artificial Intelligence (AI) has finally come out of research labs to become a ground-breaking technology. Businesses are seeing its value in a wide range of applications and therefore looking for ways to make AI an integral part of their decision-making processes. However, to trust an AI model prediction or to take downstream action based on a prediction outcome, one needs to understand the reasons for the prediction. With deep neural networks increasingly becoming the algorithm of choice for models, generation of such reasons has become more challenging. Deep neural networks are highly nested non-linear models that learn patterns in the data through complex combinations of inputs. Their complex architecture makes it very difficult to decipher the exact reasons for their prediction. Due to this lack of transparency, businesses are not able to utilize this technology in many applications. To increase the adoption of deep learning models, explainability is critical in building trust in the solution and in guiding downstream actions in business applications. In this paper we aim to create human-interpretable explanations for predictions from deep learning models. We propose a hybrid of two prior approaches, integrating clustering of the network’s hidden layer representation [2] with TREPAN decision tree [1], both of which uniquely deconstruct a neural network. Our aim is to visualize flow of information within the deep neural network using factors that make sense to humans, even if the underlying model uses more complex factors. This enables generation of human interpretable explanations (or, reasons codes) for each model outcome at an individual instance level. We demonstrate the new approach on credit card default prediction given by a deep feed forward neural network model. We compare and contrast this new integrated approach with three different approaches, based on the results we obtained from experimentation.