On the Challenges and Opportunities in Visualization for Machine Learning and Knowledge Extraction: A Research Agenda

We describe a selection of challenges at the intersection of machine learning and data visualization and outline a subjective research agenda based on professional and personal experience. The unprecedented increase in the amount, variety and the value of data has been significantly transforming the way that scientific research is carried out and businesses operate. Within data science, which has emerged as a practice to enable this data-intensive innovation by gathering together and advancing the knowledge from fields such as statistics, machine learning, knowledge extraction, data management, and visualization, visualization plays a unique and maybe the ultimate role as an approach to facilitate the human and computer cooperation, and to particularly enable the analysis of diverse and heterogeneous data using complex computational methods where algorithmic results are challenging to interpret and operationalize. Whilst algorithm development is surely at the center of the whole pipeline in disciplines such as Machine Learning and Knowledge Discovery, it is visualization which ultimately makes the results accessible to the end user. Visualization thus can be seen as a mapping from arbitrarily high-dimensional abstract spaces to the lower dimensions and plays a central and critical role in interacting with machine learning algorithms, and particularly in interactive machine learning (iML) with including the human-in-the-loop. The central goal of the CD-MAKE VIS workshop is to spark discussions at this intersection of visualization, machine learning and knowledge discovery and bring together experts from these disciplines. This paper discusses a perspective on the challenges and opportunities in this integration of these discipline and presents a number of directions and strategies for further research.

[1]  Daniel A. Keim,et al.  Visual Analytics: Scope and Challenges , 2008, Visual Data Mining.

[2]  Helwig Hauser,et al.  Visualization and Visual Analysis of Multifaceted Scientific Data: A Survey , 2013, IEEE Transactions on Visualization and Computer Graphics.

[3]  Joseph M. Hellerstein,et al.  MAD Skills: New Analysis Practices for Big Data , 2009, Proc. VLDB Endow..

[4]  Alex Endert,et al.  InterAxis: Steering Scatterplot Axes via Observation-Level Interaction , 2016, IEEE Transactions on Visualization and Computer Graphics.

[5]  Alex Endert,et al.  The State of the Art in Integrating Machine Learning into Visual Analytics , 2017, Comput. Graph. Forum.

[6]  Tamara Munzner,et al.  Steerable, Progressive Multidimensional Scaling , 2004, IEEE Symposium on Information Visualization.

[7]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[8]  Luciano Sbaiz,et al.  Finding meaning on YouTube: Tag recommendation and category discovery , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Ira Assent,et al.  Morpheus: interactive exploration of subspace clustering , 2008, KDD.

[10]  Michel Verleysen,et al.  Information Visualization, Visual Data Mining and Machine Learning (Dagstuhl Seminar 12081) , 2012, Dagstuhl Reports.

[11]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. II , 1962 .

[12]  John T. Stasko,et al.  Toward a Deeper Understanding of the Role of Interaction in Information Visualization , 2007, IEEE Transactions on Visualization and Computer Graphics.

[13]  Paulo J. G. Lisboa,et al.  Making machine learning models interpretable , 2012, ESANN.

[14]  Daniel A. Keim,et al.  Visual Interaction with Dimensionality Reduction: A Structured Literature Analysis , 2017, IEEE Transactions on Visualization and Computer Graphics.

[15]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[16]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. I. , 1962 .

[17]  Stephen R. Marsland,et al.  Machine Learning - An Algorithmic Perspective , 2009, Chapman and Hall / CRC machine learning and pattern recognition series.

[18]  Jock D. Mackinlay,et al.  Storytelling: The Next Step for Visualization , 2013, Computer.

[19]  Andreas Holzinger,et al.  Analysis of biomedical data with multilevel glyphs , 2014, BMC Bioinformatics.

[20]  Andreas Holzinger,et al.  Interactive machine learning for health informatics: when do we need the human-in-the-loop? , 2016, Brain Informatics.

[21]  Jason Dykes,et al.  Supporting theoretically-grounded model building in the social sciences through interactive visualisation , 2017, Neurocomputing.

[22]  Andreas Holzinger,et al.  Introduction to MAchine Learning & Knowledge Extraction (MAKE) , 2017, Mach. Learn. Knowl. Extr..

[23]  Igor Jurisica,et al.  Knowledge Discovery and Data Mining in Biomedical Informatics: The Future Is in Integrative, Interactive Machine Learning Solutions , 2014, Interactive Knowledge Discovery and Data Mining in Biomedical Informatics.

[24]  Andreas Holzinger,et al.  Discovering Medical Knowledge Using Visual Analytics – a survey on methods for systems biology and ? omics data – , 2015 .

[25]  Michel Verleysen,et al.  Bridging Information Visualization with Machine Learning (Dagstuhl Seminar 15101) , 2015, Dagstuhl Reports.

[26]  Matthew O. Ward,et al.  Interactive Data Visualization - Foundations, Techniques, and Applications , 2010 .

[27]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[28]  Daniel A. Keim,et al.  Analysis of Patient Groups and Immunization Results Based on Subspace Clustering , 2015, BIH.