On Interaction in Data Mining

One of the grand challenges in our networked world are the large, weakly structured and unstructured data sets. This is most evident in Biomedicine (Medical Informatics + Bioinformatics): The trend towards personalized medicine results in increasingly large amounts of (-omics) data. In the life sciences domain, most data models are characterized by complexity, which makes manual analysis very time-consuming and often practically impossible. To deal with such data, solutions from the machine learning community are indispensable and it is marvelous what sophisticated algorithms can do within high-dimensional spaces. We want to enable a domain expert end-user to interactively deal with these algorithms and data, so to enable novel discoveries and previously unknown insights. Our quest is to make such approaches interactive, hence to enable a computationally non-expert to gain insight into the data, yet to find a starting point: “What is interesting?”. When mapping the results back from arbitrarily high-dimensional spaces R⁁n into R⁁2 there is always the danger of modeling artifacts, which may be interpreted wrongly. A synergistic combination of methodologies and approaches of two areas offer ideal conditions towards working on solutions for such problems: Human-Computer Interaction (HCI) and Knowledge Discovery/Data Mining (KDD), with the goal of supporting human intelligence with machine intelligence. Both fields have many unexplored, complementing intersections and the aim is to combine the strengths of automatic, computer-based methods, both in time and space, with the strengths of human perception and cognition, e.g. in discovering patterns, trends, similarities, anomalies etc. in data.