Can Embedding Solve Scalability Issues for Mixed-Data Graph Clustering?

It is widely accepted that the field of Data Analytics has entered into the era of Big Data. In particular, it has to deal with so-called Big Graph Data, which is the focus of this paper. Graph Data is present in many fields, such as Social Networks, Biological Networks, Computer Networks, and so on. It is recognized that data analysts benefit from interactive real time data exploration techniques such as clustering and zoom capabilities on the clusters. However, although clustering is one of the key aspects of graph data analysis, there is a lack of scalable graph clustering algorithms which would support interactive techniques. This paper presents an approach based on combining graph clustering and graph coordinate system embedding, and which shows promising results through initial experiments. Our approach also incorporates both structural and attribute information, which can lead to a more meaningful clustering.

[1]  Hong Cheng,et al.  A model-based approach to attributed graph clustering , 2012, SIGMOD Conference.

[2]  Ling Liu,et al.  Social influence based clustering of heterogeneous information networks , 2013, KDD.

[3]  Haithum Elhadi,et al.  Structure and attributes community detection: comparative analysis of composite, ensemble and selection methods , 2013, SNAKDD '13.

[4]  Daniel A. Keim Exploring Big Data using Visual Analytics , 2014, EDBT/ICDT Workshops.

[5]  Qing He,et al.  Parallel K-Means Clustering Based on MapReduce , 2009, CloudCom.

[6]  Tobias Isenberg,et al.  Proceedings of the Workshop on Data Exploration for Interactive Surfaces DEXIS 2011 , 2015 .

[7]  Hong Zhou,et al.  Geometry-Based Edge Clustering for Graph Visualization , 2008, IEEE Transactions on Visualization and Computer Graphics.

[8]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[9]  János Abonyi,et al.  Graph-Based Clustering and Data Visualization Algorithms , 2013, SpringerBriefs in Computer Science.

[10]  Ivan Herman,et al.  Graph Visualization and Navigation in Information Visualization: A Survey , 2000, IEEE Trans. Vis. Comput. Graph..

[11]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[12]  Haitao Zheng,et al.  Orion: Shortest Path Estimation for Large Social Graphs , 2010, WOSN.

[13]  Walter Didimo,et al.  Visual Analysis of Large Graphs Using (X,Y)-Clustering and Hybrid Visualizations , 2010, IEEE Transactions on Visualization and Computer Graphics.

[14]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Michael Ley,et al.  DBLP - Some Lessons Learned , 2009, Proc. VLDB Endow..

[16]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[17]  Ben Y. Zhao,et al.  Efficient shortest paths on massive social graphs , 2011, 7th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom).

[18]  Daniel A. Keim,et al.  Visual Analytics: Definition, Process, and Challenges , 2008, Information Visualization.

[19]  Hong Cheng,et al.  Clustering Large Attributed Graphs: An Efficient Incremental Approach , 2010, 2010 IEEE International Conference on Data Mining.

[20]  Christos H. Papadimitriou,et al.  On a conjecture related to geometric routing , 2004, Theor. Comput. Sci..

[21]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2008, Information Retrieval.