Integrating Guided Clustering in Visual Analytics to Support Domain Expert Reasoning Processes

When combining Information Visualization (IV) and Machine Learning (ML) to assist data analysis conducted by domain experts the goal is often to leverage the domain knowledge in the underlying ML algorithms. We present an analytical process and a visual analytics tool that uses visual queries to capture examples from the domain experts’ existing reasoning process to guide the subsequent clustering. In collaboration with personnel at the Danish Business Authority, we found that their analytical reasoning processes often start with examples or risk factors derived from previous cases. Given the nature of the available examples the resulting labeling of the companies is only partial which can be challenging to cope with in ML. Concretely, we found that the knowledge provided by the auditors suffers from two distinct characteristics: