Using Visualization to improve Clustering Analysis on Heterogeneous Information Network

The exploration and analysis of data mining methodologies is an important task for effective knowledge discovery, especially in today's heterogeneous information networks. Previously presented approaches for mining optimization aim primarily at the improvements of time complexity, space complexity, accuracy, and robustness. We extend the state-of-the-art method by concentrating on user-availability and algorithm understandability. Specifically, we use Rankclus, a classic clustering algorithm as an example. After uncovering the unseen computing processes to be displayed in a visual form, the whole clustering processes are transparent to the users, which may help them more clearly and quickly understand how the algorithms are computed, how does each object influence one another. In addition, we use a density approach to intuitively simplify the discovery of data patterns, and through the visualized results, users can adjust algorithm parameters with or without professional training. Finally, we use another two visual techniques to improve the visualization quality: a heatmap matrix designed for checking the similarities of objects which are in the same cluster, and a DOItree implemented to further analyze the accuracy of the algorithms.

[1]  Qi Han,et al.  CiteRivers: Visual Analytics of Citation Patterns , 2016, IEEE Transactions on Visualization and Computer Graphics.

[2]  Lucy T. Nowell,et al.  ThemeRiver: Visualizing Thematic Changes in Large Document Collections , 2002, IEEE Trans. Vis. Comput. Graph..

[3]  Philip S. Yu,et al.  HighSim : Highly Effective Similarity Measurement in Large Heterogeneous Information Networks , 2016 .

[4]  Fangzhao Wu,et al.  OpinionFlow: Visual Analysis of Opinion Diffusion on Social Media , 2014, IEEE Transactions on Visualization and Computer Graphics.

[5]  Mao Lin Huang,et al.  Using Visual Cues on DOITree for Visualizing Large Hierarchical Data , 2014, 2014 18th International Conference on Information Visualisation.

[6]  Andrew W. Moore,et al.  Accelerating exact k-means algorithms with geometric reasoning , 1999, KDD '99.

[7]  Daniel A. Keim,et al.  EventRiver: Visually Exploring Text Collections with Temporal References , 2012, IEEE Transactions on Visualization and Computer Graphics.

[8]  Daniel A. Keim,et al.  Information Visualization and Visual Data Mining , 2002, IEEE Trans. Vis. Comput. Graph..

[9]  Peter R. Keller,et al.  Visual cues - practical data visualization , 1993 .

[10]  N. Kogan,et al.  Understanding visual metaphor: developmental and individual differences. , 1980, Monographs of the Society for Research in Child Development.

[11]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[12]  Yizhou Sun,et al.  Mining Heterogeneous Information Networks: Principles and Methodologies , 2012, Mining Heterogeneous Information Networks: Principles and Methodologies.

[13]  Yintao Yu IVIS: Search and visualization on heterogeneous information networks , 2011 .

[14]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[15]  Michael E. Papka,et al.  Large-Scale Data Visualization Using Parallel Data Streaming , 2001, IEEE Computer Graphics and Applications.

[16]  Yizhou Sun,et al.  Integrating Clustering with Ranking in Heterogeneous Information Networks Analysis , 2010, Link Mining.

[17]  Jiawei Han,et al.  Mining heterogeneous information networks , 2010, Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '10.

[18]  Ben Shneiderman,et al.  Ordered treemap layouts , 2001, IEEE Symposium on Information Visualization, 2001. INFOVIS 2001..

[19]  Mao Lin Huang,et al.  A space-optimized tree visualization , 2002, IEEE Symposium on Information Visualization, 2002. INFOVIS 2002..

[20]  Xiaodi Huang,et al.  Three-Dimensional EncCon Tree , 2007, Computer Graphics, Imaging and Visualisation (CGIV 2007).

[21]  Hermine Feinstein,et al.  Meaning and Visual Metaphor , 1982 .

[22]  Yizhou Sun,et al.  RankClus: integrating clustering with ranking for heterogeneous information network analysis , 2009, EDBT '09.

[23]  Leland Wilkinson,et al.  The History of the Cluster Heat Map , 2009 .

[24]  Martin T. Hagan,et al.  Neural network design , 1995 .

[25]  Pat Hanrahan,et al.  Visualization of Heterogeneous Data , 2007, IEEE Transactions on Visualization and Computer Graphics.

[26]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[27]  Jiawei Han,et al.  Ranking-based classification of heterogeneous information networks , 2011, KDD.

[28]  Yizhou Sun,et al.  Mining heterogeneous information networks: a structural analysis approach , 2013, SKDD.

[29]  Zhiguo Zhu,et al.  Measuring influence in online social network based on the user-content bipartite graph , 2015, Comput. Hum. Behav..

[30]  Jianwen Tao RCHIG: An Effective Clustering Algorithm with Ranking , 2009, J. Softw..

[31]  Download Book,et al.  Information Visualization in Data Mining and Knowledge Discovery , 2001 .

[32]  Elena Baralis,et al.  NetCluster: A clustering-based framework to analyze internet passive measurements data , 2013, Comput. Networks.

[33]  Sergei Vassilvitskii,et al.  Scalable K-Means++ , 2012, Proc. VLDB Endow..

[34]  Catherine Plaisant,et al.  SpaceTree: supporting exploration in large node link tree, design evolution and empirical evaluation , 2002, IEEE Symposium on Information Visualization, 2002. INFOVIS 2002..