Interactive Machine Learning by Visualization: A Small Data Solution

Machine learning algorithms and traditional data mining process usually require a large volume of data to train the algorithm-specific models, with little or no user feedback during the model building process. Such a "big data" based automatic learning strategy is sometimes unrealistic for applications where data collection or processing is very expensive or difficult, such as in clinical trials. Furthermore, expert knowledge can be very valuable in the model building process in some fields such as biomedical sciences. In this paper, we propose a new visual analytics approach to interactive machine learning and visual data mining. In this approach, multi-dimensional data visualization techniques are employed to facilitate user interactions with the machine learning and mining process. This allows dynamic user feedback in different forms, such as data selection, data labeling, and data correction, to enhance the efficiency of model building. In particular, this approach can significantly reduce the amount of data required for training an accurate model, and therefore can be highly impactful for applications where large amount of data is hard to obtain. The proposed approach is tested on two application problems: the handwriting recognition (classification) problem and the human cognitive score prediction (regression) problem. Both experiments show that visualization supported interactive machine learning and data mining can achieve the same accuracy as an automatic process can with much smaller training data sets.

[1]  Ian H. Witten,et al.  Interactive machine learning: letting users build classifiers , 2002, Int. J. Hum. Comput. Stud..

[2]  Silvia Miksch,et al.  Visual Methods for Analyzing Probabilistic Classification Data , 2014, IEEE Transactions on Visualization and Computer Graphics.

[3]  Bongshin Lee,et al.  Squares: Supporting Interactive Performance Analysis for Multiclass Classifiers , 2017, IEEE Transactions on Visualization and Computer Graphics.

[4]  Shie Mannor,et al.  Graying the black box: Understanding DQNs , 2016, ICML.

[5]  Qinying Liao,et al.  An Uncertainty-Aware Approach for Exploratory Microblog Retrieval , 2015, IEEE Transactions on Visualization and Computer Graphics.

[6]  Abraham Z. Snyder,et al.  Human Connectome Project informatics: Quality control, database services, and data visualization , 2013, NeuroImage.

[7]  Lutz Hamel,et al.  Visualization of Support Vector Machines with Unsupervised Learning , 2006, 2006 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology.

[8]  Baining Guo,et al.  TopicPanorama: A Full Picture of Relevant Topics , 2014, IEEE Transactions on Visualization and Computer Graphics.

[9]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[10]  David S. Ebert,et al.  DimScanner: A relation-based visual exploration approach towards data dimension inspection , 2016, 2016 IEEE Conference on Visual Analytics Science and Technology (VAST).

[11]  Lars Linsen,et al.  Choosing Visualization Techniques for Multidimensional Data Projection Tasks: A Guideline with Examples , 2015, VISIGRAPP.

[12]  Kenney Ng,et al.  Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models , 2016, CHI.

[13]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[14]  Andrew J. Saykin,et al.  Identifying the Neuroanatomical Basis of Cognitive Impairment in Alzheimer's Disease by Correlation- and Nonlinearity-Aware Sparse Bayesian Learning , 2014, IEEE Transactions on Medical Imaging.

[15]  Xiting Wang,et al.  Towards better analysis of machine learning models: A visual analytics perspective , 2017, Vis. Informatics.

[16]  Enrico Bertini,et al.  Using Visual Analytics to Interpret Predictive Machine Learning Models , 2016, ArXiv.

[17]  Adam W. Harley An Interactive Node-Link Visualization of Convolutional Neural Networks , 2015, ISVC.

[18]  Mark Jenkinson,et al.  The minimal preprocessing pipelines for the Human Connectome Project , 2013, NeuroImage.

[19]  Shannon L. Risacher,et al.  Sparse Bayesian multi-task learning for predicting cognitive outcomes from neuroimaging measures in Alzheimer's disease , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Rosane Minghim,et al.  An Approach to Supporting Incremental Visual Data Classification , 2015, IEEE Transactions on Visualization and Computer Graphics.

[21]  Max Welling,et al.  Visualizing Deep Neural Network Decisions: Prediction Difference Analysis , 2017, ICLR.

[22]  Kristin A. Cook,et al.  Illuminating the Path: The Research and Development Agenda for Visual Analytics , 2005 .

[23]  Jaegul Choo,et al.  UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization , 2013, IEEE Transactions on Visualization and Computer Graphics.

[24]  Kwan-Liu Ma,et al.  Opening the black box - data driven visualization of neural networks , 2005, VIS 05. IEEE Visualization, 2005..

[25]  Sergio A. Alvarez,et al.  NVIS: an interactive visualization tool for neural networks , 2001, IS&T/SPIE Electronic Imaging.

[26]  Li Shen,et al.  Cortical surface biomarkers for predicting cognitive outcomes using group l 2,1 norm , 2015, Neurobiology of Aging.

[27]  J. Weston,et al.  Support vector regression with ANOVA decomposition kernels , 1999 .

[28]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[29]  Rosane Minghim,et al.  Improved Similarity Trees and their Application to Visual Data Classification , 2011, IEEE Transactions on Visualization and Computer Graphics.

[30]  Maya Cakmak,et al.  Power to the People: The Role of Humans in Interactive Machine Learning , 2014, AI Mag..

[31]  Alex Pentland,et al.  Face recognition using eigenfaces , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Anders M. Dale,et al.  Cortical Surface-Based Analysis I. Segmentation and Surface Reconstruction , 1999, NeuroImage.

[33]  Essa Yacoub,et al.  The WU-Minn Human Connectome Project: An overview , 2013, NeuroImage.

[34]  Joaquín Goñi,et al.  Multigraph Visualization for Feature Classification of Brain Network Data , 2016, EuroVA@EuroVis.

[35]  Jeffrey Heer,et al.  Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment , 2013, ICML.

[36]  Stephen Mann,et al.  Cubic precision Clough-Tocher interpolation , 1999, Comput. Aided Geom. Des..

[37]  Shannon L. Risacher,et al.  Sparse multi-task regression and feature selection to identify brain imaging predictors for memory performance , 2011, 2011 International Conference on Computer Vision.

[38]  Zhen Li,et al.  Towards Better Analysis of Deep Convolutional Neural Networks , 2016, IEEE Transactions on Visualization and Computer Graphics.

[39]  Paulo E. Rauber,et al.  Visualizing the Hidden Activity of Artificial Neural Networks , 2017, IEEE Transactions on Visualization and Computer Graphics.