User-Based Visual-Interactive Similarity Definition for Mixed Data Objects - Concept and First Implementation

The definition of similarity between data objects plays a key role in many analytical systems. The process of similarity definition comprises several challenges as three main problems occur: different stakeholders, mixed data, and changing requirements. Firstly, in many applications the developers of the analytical system (data scientists) model the similarity, while the users (domain experts) have distinct (mental) similarity notions. Secondly, the definition of similarity for mixed data types is challenging. Thirdly, many systems use static similarity models that cannot adapt to changing data or user needs. We present a concept for the development of systems that support the visual-interactive similarity definition for mixed data objects emphasizing 15 crucial steps. For each step different design considerations and implementation variants are presented, revealing a large design space. Moreover, we present a first implementation of our concept, enabling domain experts to express mental similarity notions through a visual-interactive system. The provided implementation tackles the different-stakeholders problem, the mixed data problem, and the changing requirements problem. The implementation is not limited to a specific mixed data set. However, we show the applicability of our implementation in a case study where a functional similarity model is trained for countries as objects.

[1]  Dieter Schmalstieg,et al.  VisBricks: Multiform Visualization of Large, Inhomogeneous Data , 2011, IEEE Transactions on Visualization and Computer Graphics.

[2]  Alexander Zien,et al.  Probabilistic Semi-Supervised Clustering with Constraints , 2006 .

[3]  Tobias Schreck,et al.  MotionExplorer: Exploratory Search in Human Motion Capture Data Based on Hierarchical Aggregation , 2013, IEEE Transactions on Visualization and Computer Graphics.

[4]  Taku Komura,et al.  Topology matching for fully automatic similarity estimation of 3D shapes , 2001, SIGGRAPH.

[5]  John T. Stasko,et al.  Mental Models, Visual Reasoning and Interaction in Information Visualization: A Top-down Perspective , 2010, IEEE Transactions on Visualization and Computer Graphics.

[6]  Sung-Hyuk Cha,et al.  Enhancing Binary Feature Vector Similarity Measures , 2006 .

[7]  Sara Johansson Visual exploration of categorical and mixed data sets , 2009, VAKD '09.

[8]  Jarke J. van Wijk,et al.  What Does the User Want to See? What do the Data Want to Be? , 2009, Inf. Vis..

[9]  Desney S. Tan,et al.  EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers , 2009, CHI.

[10]  Jean-Daniel Fekete,et al.  Hierarchical Aggregation for Information Visualization: Overview, Techniques, and Design Guidelines , 2010, IEEE Transactions on Visualization and Computer Graphics.

[11]  Ian H. Witten,et al.  Interactive machine learning: letting users build classifiers , 2002, Int. J. Hum. Comput. Stud..

[12]  Harald Piringer,et al.  A Partition-Based Framework for Building and Validating Regression Models , 2013, IEEE Transactions on Visualization and Computer Graphics.

[13]  Hans-Peter Kriegel,et al.  Towards an Effective Cooperation of the Computer and the User for Classification , 2000, KDD 2000.

[14]  Marie-Jeanne Lesot,et al.  Similarity measures for binary and numerical data: a survey , 2008, Int. J. Knowl. Eng. Soft Data Paradigms.

[15]  Jimmy Johansson,et al.  Interactive Quantification of Categorical Variables in Mixed Data Sets , 2008, 2008 12th International Conference Information Visualisation.

[16]  J. Bernard,et al.  Multi-Scale Visual Quality Assessment for Cluster Analysis with Self-Organizing Maps , 2011 .

[17]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[18]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[19]  Rong Jin,et al.  Distance Metric Learning: A Comprehensive Survey , 2006 .

[20]  Silvia Miksch,et al.  Reinventing the Contingency Wheel: Scalable Visual Analytics of Large Categorical Data , 2012, IEEE Transactions on Visualization and Computer Graphics.

[21]  Jürgen Bernard,et al.  Visual-Interactive Preprocessing of Time Series Data , 2012, SIGRAD.

[22]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[23]  Jarkko Venna,et al.  Neighborhood Preservation in Nonlinear Projection Methods: An Experimental Study , 2001, ICANN.

[24]  James Davey,et al.  SmartStripes - Looking under the Hood of Feature Subset Selection Methods , 2011, EuroVA@EuroVis.

[25]  Jaegul Choo,et al.  iVisClassifier: An interactive visual analytics system for classification based on supervised dimension reduction , 2010, 2010 IEEE Symposium on Visual Analytics Science and Technology.

[26]  C. Tappert,et al.  A Survey of Binary Similarity and Distance Measures , 2010 .

[27]  Gao Xinbo,et al.  A CSA-based clustering algorithm for large data sets with mixed numeric and categorical values , 2004, Fifth World Congress on Intelligent Control and Automation (IEEE Cat. No.04EX788).

[28]  Tamara Munzner,et al.  A Nested Model for Visualization Design and Validation , 2009, IEEE Transactions on Visualization and Computer Graphics.

[29]  Vipin Kumar,et al.  Similarity Measures for Categorical Data: A Comparative Evaluation , 2008, SDM.

[30]  Luis Gustavo Nonato,et al.  User‐driven Feature Space Transformation , 2013, Comput. Graph. Forum.

[31]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[32]  Douglas M. Hawkins,et al.  The Problem of Overfitting , 2004, J. Chem. Inf. Model..

[33]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[34]  Philip H. Ramsey Critical Values for Spearman’s Rank Order Correlation , 1989 .

[35]  Carla E. Brodley,et al.  Dis-function: Learning distance functions interactively , 2012, 2012 IEEE Conference on Visual Analytics Science and Technology (VAST).

[36]  Tamara Munzner,et al.  DimStiller: Workflows for dimensional analysis and reduction , 2010, 2010 IEEE Symposium on Visual Analytics Science and Technology.

[37]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[38]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[39]  Tobias Schreck,et al.  Visual Cluster Analysis of Trajectory Data with Interactive Kohonen Maps , 2008, 2008 IEEE Symposium on Visual Analytics Science and Technology.

[40]  J. Rodgers,et al.  Thirteen ways to look at the correlation coefficient , 1988 .

[41]  Jürgen Bernard,et al.  Visual‐interactive Exploration of Interesting Multivariate Relations in Mixed Research Data Sets , 2014, Comput. Graph. Forum.

[42]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[43]  Jarke J. van Wijk,et al.  Bridging the Gaps , 2006, IEEE Computer Graphics and Applications.

[44]  Tobias Schreck,et al.  Content-based layouts for exploratory metadata search in scientific research data , 2012, JCDL '12.

[45]  Min Chen,et al.  Glyph-based Visualization: Foundations, Design Guidelines, Techniques and Applications , 2013, Eurographics.

[46]  Helwig Hauser,et al.  Parallel Sets: interactive exploration and visual analysis of categorical data , 2006, IEEE Transactions on Visualization and Computer Graphics.

[47]  Ingo Hotz,et al.  iPCA : An Interactive System for PCA-based Visual Analytics , 2008 .

[48]  Philippe Castagliola,et al.  A Comparison of the Readability of Graphs Using Node-Link and Matrix-Based Representations , 2004 .