Employing artificial neural networks for constructing metadata-based model to automatically select an appropriate data visualization technique

Display Omitted Solution to automatically select appropriate visualization technique based on metadata is presented.A purpose built dataset extracted from existing knowledge in the field is used to train classifiers.A comparison of the results obtained from the best ANN architecture is performed with five other classifiers.The proposed system outperforms four classifiers in terms of accuracy and five classifiers based on running time.The work brings new perspective in the field of visualization. Advances in computing technology have been instrumental in creating an assortment of powerful information visualization techniques. However, the selection of a suitable and effective visualization technique for a specific dataset and a data mining task is not trivial. This work automatically selects an appropriate visualization technique based on the given metadata and the task that a user intends to perform. The appropriate visualization is predicted based on an artificial neural network (ANN)-based model which classifies the input data into one of the eight predefined classes. A purpose built dataset extracted from the existing knowledge in the discipline is utilized to train the neural network. The dataset covers eight visualization techniques, including: histogram, line chart, pie chart, scatter plot, parallel coordinates, map, treemap, and linked graph. Various architectures using different numbers of hidden units, hidden layers, and input and output data formats have been evaluated to find the optimal neural network architecture. The performance of neural networks is measured using: confusion matrix, accuracy, precision, and sensitivity of the classification. Optimal neural network architecture is determined by convergence time and number of iterations. The results obtained from the best ANN architecture are compared with five other classifiers, k-nearest neighbor, nave Bayes, decision tree, random forest, and support vector machine. The proposed system outperforms four classifiers in terms of accuracy and all five classifiers based on execution time. The trained neural network is also tested on twenty real-world benchmark datasets, where the proposed approach also provides two alternate visualizations, in addition to the most suitable one, for a particular dataset. A qualitative comparison with the state-of-the-art approaches is also presented. The results show that the proposed technique assists in selecting an appropriate visualization technique for a given dataset with high accuracy.

[1]  Zahid Halim,et al.  Weighted MUSE for Frequent Sub-Graph Pattern Finding in Uncertain DBLP Data , 2011, 2011 International Conference on Internet Technology and Applications.

[2]  Juan Miguel García-Gómez,et al.  BIOINFORMATICS APPLICATIONS NOTE Sequence analysis Manipulation of FASTQ data with Galaxy , 2005 .

[3]  Peter Fox,et al.  Changing the Equation on Scientific Data Visualization , 2011, Science.

[4]  Glenn J. Myatt,et al.  Making Sense of Data II: A Practical Guide to Data Visualization, Advanced Data Mining Methods, and Applications , 2010 .

[5]  Klaus Meißner,et al.  Capturing and Reusing Empirical Visualization Knowledge , 2013, UMAP Workshops.

[6]  Matej Novotny,et al.  Visually Effective Information Visualization of Large Data , 2004 .

[7]  Ian Witten,et al.  Data Mining , 2000 .

[8]  Raffael Marty,et al.  Applied Security Visualization , 2008 .

[9]  Andrzej Król,et al.  Genetic Algorithm Automated Generation of Multivariate Color Tables for Visualization of Multimodal Medical Data Sets , 2006, CIC.

[10]  Ivan Herman,et al.  Graph Visualization and Navigation in Information Visualization: A Survey , 2000, IEEE Trans. Vis. Comput. Graph..

[11]  Anupam Shukla,et al.  Real Life Applications of Soft Computing , 2010 .

[12]  Arash Bahrammirzaee,et al.  A comparative survey of artificial intelligence applications in finance: artificial neural networks, expert system and hybrid intelligent systems , 2010, Neural Computing and Applications.

[13]  Dario Floreano,et al.  Bio-Inspired Artificial Intelligence: Theories, Methods, and Technologies , 2008 .

[14]  Jeffery S. Horsburgh,et al.  Data visualization and analysis within a Hydrologic Information System: Integrating with the R statistical computing environment , 2014, Environ. Model. Softw..

[15]  Thomas Ertl,et al.  Techniques for Analyzing Empirical Visualization Experiments Through Visual Methods , 2013, KIK@KI.

[16]  Ben Shneiderman,et al.  Treemaps for space-constrained visualization of hierarchies , 2005 .

[17]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[18]  Raphael Fuchs,et al.  Visual Human+Machine Learning , 2009, IEEE Transactions on Visualization and Computer Graphics.

[19]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[20]  John T. Stasko,et al.  The Parallel Coordinates Matrix , 2012, EuroVis.

[21]  Gilles Venturini,et al.  A User Assistant for the Selection and Parameterization of the Visualizations in Visual Data Mining , 2012, 2012 16th International Conference on Information Visualisation.

[22]  Eser Kandogan,et al.  Visualizing multi-dimensional clusters, trends, and outliers using star coordinates , 2001, KDD '01.

[23]  Almir Olivette Artero,et al.  Uncovering Clusters in Crowded Parallel Coordinates Visualizations , 2004 .

[24]  Gennian Ge,et al.  Optimal Ternary Constant-Composition Codes of Weight Four and Distance Five , 2011, IEEE Transactions on Information Theory.

[25]  Heidrun Schumann,et al.  Survey of Visualization Techniques , 2011 .

[26]  Elena Gabrielli,et al.  Role of CD45 Signaling Pathway in Galactoxylomannan-Induced T Cell Damage , 2010, PloS one.

[27]  William J. Schroeder,et al.  Research Challenges for Visualization Software , 2012, Computer.

[28]  Alan M. MacEachren,et al.  Animation and the Role of Map Design in Scientific Visualization , 1992 .

[29]  Darius Miniotas,et al.  Visualization of eye gaze data using heat maps , 2007 .

[30]  Timo Ropinski,et al.  Hybrid Data Visualization Based on Depth Complexity Histogram Analysis , 2015, Comput. Graph. Forum.

[31]  Tsun-Po Yang,et al.  Genevar: a database and Java application for the analysis and visualization of SNP-gene associations in eQTL studies , 2010, Bioinform..

[32]  Peng Liang,et al.  A systematic review of software architecture visualization techniques , 2014, J. Syst. Softw..

[33]  Gary D Bader,et al.  Enrichment Map: A Network-Based Method for Gene-Set Enrichment Visualization and Interpretation , 2010, PloS one.

[34]  Xiangxu Meng,et al.  The Polar Parallel Coordinates Method for Time-Series Data Visualization , 2012, 2012 Fourth International Conference on Computational and Information Sciences.

[35]  Zahid Halim,et al.  Malicious users' circle detection in social network based on spatio-temporal co-occurrence , 2011, International Conference on Computer Networks and Information Technology.

[36]  Steffen Lange,et al.  Problem-oriented visualization of multi-dimensional data sets , 1995 .

[37]  Hong Zhou,et al.  Scattering Points in Parallel Coordinates , 2009, IEEE Transactions on Visualization and Computer Graphics.

[38]  Li Zhang,et al.  Active learning based on coupled KNN pseudo pruning , 2011, Neural Computing and Applications.

[39]  Daniel A. Keim,et al.  Designing Pixel-Oriented Visualization Techniques: Theory and Applications , 2000, IEEE Trans. Vis. Comput. Graph..

[40]  Michael Friendly,et al.  A Brief History of Data Visualization , 2008 .

[41]  Karthik Ganesan Pillai,et al.  Big Data New Frontiers: Mining, Search and Management of Massive Repositories of Solar Image Data and Solar Events , 2013, ADBIS.

[42]  Ben Shneiderman,et al.  Tree-maps: a space-filling approach to the visualization of hierarchical information structures , 1991, Proceeding Visualization '91.

[43]  Steven K. Feiner,et al.  Data characterization for automatically visualizing heterogeneous information , 1996, Proceedings IEEE Symposium on Information Visualization '96.

[44]  Dong Liang,et al.  Flash-Optimized Temporal Indexing for Time-Series Data Storage on Sensor Platforms , 2014, TOSN.

[45]  Heidrun Schumann,et al.  A Design Space of Visualization Tasks , 2013, IEEE Transactions on Visualization and Computer Graphics.

[46]  Alfred Inselberg,et al.  Parallel Coordinates: Visual Multidimensional Geometry and Its Applications , 2003, KDIR.

[47]  Wibke Weber,et al.  Data Visualization in Online Journalism and Its Implications for the Production Process , 2012, 2012 16th International Conference on Information Visualisation.

[48]  Ed H. Chi,et al.  A taxonomy of visualization techniques using the data state reference model , 2000, IEEE Symposium on Information Visualization 2000. INFOVIS 2000. Proceedings.

[49]  Jonathan A. Schwabish An Economist's Guide to Visualizing Data , 2014 .

[50]  Ben Shneiderman,et al.  Discovering temporal changes in hierarchical transportation data: Visual analytics & text reporting tools , 2015 .

[51]  Celeste Lyn Paul Analyzing card-sorting data using graph visualization , 2014 .

[52]  John T. Stasko,et al.  Characterizing the intelligence analysis process: Informing visual analytics design through a longitudinal field study , 2011, 2011 IEEE Conference on Visual Analytics Science and Technology (VAST).

[53]  S. Sathiya Keerthi,et al.  Which Is the Best Multiclass SVM Method? An Empirical Study , 2005, Multiple Classifier Systems.

[54]  Colin Ware,et al.  Information Visualization: Perception for Design , 2000 .

[55]  Zahid Halim,et al.  Profiling drivers based on driver dependent vehicle driving features , 2015, Applied Intelligence.

[56]  Takashi Matsuno,et al.  Wind Tunnel Evaluation Based Design of Lift Creating Cylinder Using Plasma Actuators , 2015 .

[57]  Clemente Izurieta,et al.  A centralized tool for managing, archiving, and serving point-in-time data in ecological research laboratories , 2014, Environ. Model. Softw..

[58]  Yi-zeng Liang,et al.  Chromatographic fingerprint analysis--a rational approach for quality assessment of traditional Chinese herbal medicine. , 2006, Journal of chromatography. A.

[59]  Radu Sion,et al.  TrustedDB: A Trusted Hardware-Based Database with Privacy and Data Confidentiality , 2014, IEEE Trans. Knowl. Data Eng..

[60]  Daniel A. Keim,et al.  Information Visualization and Visual Data Mining , 2002, IEEE Trans. Vis. Comput. Graph..

[61]  Julius Sendroy,et al.  STUDIES OF GAS AND ELECTROLYTE EQUILIBRIA IN BLOOD XV. LINE CHARTS FOR GRAPHIC CALCULATIONS BY THE HENDERSON-HASSELBALCH EQUATION, AND FOR CALCULATING PLASMA CARBON DIOXIDE CONTENT FROM WHOLE BLOOD CONTENT , 1928 .

[62]  P. Touchette,et al.  A scatter plot for identifying stimulus control of problem behavior. , 1985, Journal of applied behavior analysis.

[63]  Neff Walker,et al.  Classifying visual knowledge representations: a foundation for visualization research , 1990, Proceedings of the First IEEE Conference on Visualization: Visualization `90.

[64]  Jarke J. van Wijk,et al.  Cushion Treemaps: Visualization of Hierarchical Information , 1999, INFOVIS.

[65]  Robert J. Gillies,et al.  Decoding brain cancer dynamics: a quantitative histogram-based approach using temporal MRI , 2015, Medical Imaging.

[66]  Zahid Halim,et al.  Evolutionary Search in the Space of Rules for Creation of New Two-Player Board Games , 2014, Int. J. Artif. Intell. Tools.

[67]  Martin Wattenberg,et al.  ManyEyes: a Site for Visualization at Internet Scale , 2007, IEEE Transactions on Visualization and Computer Graphics.

[68]  Francisco Herrera,et al.  An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes , 2011, Pattern Recognit..

[69]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[70]  Georges G. Grinstein,et al.  Benchmark Development for the Evaluation of Visualization for Data Mining , 2017 .

[71]  Gary B. Lamont,et al.  Visualizing particle swarm optimization - Gaussian particle swarm optimization , 2003, Proceedings of the 2003 IEEE Swarm Intelligence Symposium. SIS'03 (Cat. No.03EX706).

[72]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[73]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[74]  Elias Pampalk,et al.  Using Smoothed Data Histograms for Cluster Visualization in Self-Organizing Maps , 2002, ICANN.

[75]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[76]  F. Englert,et al.  Linked cluster expansions in the statistical theory of ferromagnetism , 1963 .

[77]  Peer Bork,et al.  Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy , 2011, Nucleic Acids Res..

[78]  W. Schultz,et al.  Economic Choices Reveal Probability Distortion in Macaque Monkeys , 2015, The Journal of Neuroscience.

[79]  Matthew O. Ward,et al.  Hierarchical parallel coordinates for exploration of large datasets , 1999, Proceedings Visualization '99 (Cat. No.99CB37067).

[80]  T. Rangel,et al.  SAM: a comprehensive application for Spatial Analysis in Macroecology , 2010 .

[81]  Ian G. Enting,et al.  Generating functions for enumerating self-avoiding rings on the square lattice , 1980 .

[82]  Daniel A. Keim,et al.  Mastering the Information Age - Solving Problems with Visual Analytics , 2010 .

[83]  M. Buscema,et al.  Introduction to artificial neural networks. , 2007, European journal of gastroenterology & hepatology.

[84]  Michael Balzer,et al.  Voronoi treemaps for the visualization of software metrics , 2005, SoftVis '05.