Teaching Explanations by Examples

Machine teaching is an emerging field that has recently attracted the general attention in AI [7]. Briefly, machine teaching can be considered as an inverse problem to machine learning. Concretely, the goal of machine teaching is to find the smallest (optimal) training set that –using a learning algorithm– produces a target model. Machine teaching has been applied in many different fields. For instance, in education the “learner” can be a human student, and the teacher has a target model (i.e. the educational goal). If we assume a cognitive learning model of the student, machine teaching can be employed to reverse-engineer the optimal training data. In other words, we obtain the data that is going to optimise the learning process for that student, like a personalised lesson. However, most results in the machine teaching literature only apply to concept languages with examples that do not have structure. When confronted with richer languages, we find that we may teach a concept with a single example, but this example might be arbitrarily large. Looking for a more intuitive way of assessing the theoretical feasibility of teaching concepts for structured languages, in [5] we introduced the teaching size and obtained results for universal languages (e.g., Turing machine or natural language). We included an experimental validation of our method for teaching a universal language: the universal language P3, a simple language for string manipulation. When coupled with a strong bias for simplicity, we found the remarkable result that, in many cases, teaching a concept with examples led to shorter descriptions than giving the shortest rule-based or program-base transcription of the logic of the decision. For the first time, we showed both theoretically and empirically that teaching with examples is often more efficient than giving the concept itself. In this work we propose to explore the use of machine teaching for providing explanations of AI models. Decades of converting black boxes into white boxes have not solved the problem of extracting comprehensible explanations to justify the decisions made by a model. Either these models oversimplify the problem or they are not assimilated by humans, or both. This is not only because the techniques in explainable AI ignore the psychology of the recipient (the explananti) but also because they ignore the way in which concepts can be easily transmitted from one language of representation to another. Machine teaching techniques can be employed to extract significant instances from AI systems and ML models that can be used to (1) give humans a better understanding of the behaviour