Efficient large graph pattern mining for big data in the cloud

Mining big graph data is an important problem in the graph mining research area. Although cloud computing is effective at solving traditional algorithm problems, mining frequent patterns of a massive graph with cloud computing still faces the three challenges: 1) the graph partition problem, 2) asymmetry of information, and 3) pattern-preservation merging. Therefore, this paper presents a new approach, the cloud-based SpiderMine (c-SpiderMine), which exploits cloud computing to process the mining of large patterns on big graph data. The proposed method addresses the above issues for implementing a big graph data mining algorithm in the cloud. We conduct the experiments with three real data sets, and the experimental results demonstrate that c-SpiderMine can significantly reduce execution time with high scalability in dealing with big data in the cloud.

[1]  Michalis Vazirgiannis,et al.  Advanced graph mining for community evaluation in social networks and the web , 2013, WSDM.

[2]  Christos Faloutsos,et al.  Pegasus: Mining billion-scale graphs in the cloud , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Panos Kalnis,et al.  Mizan: a system for dynamic load balancing in large-scale graph processing , 2013, EuroSys '13.

[4]  Danai Koutra,et al.  OPAvion: mining and visualization in large graphs , 2012, SIGMOD Conference.

[5]  Dennis Shasha,et al.  A subgraph isomorphism algorithm and its application to biochemical data , 2013, BMC Bioinformatics.

[6]  Christian Borgelt,et al.  MoSS: a program for molecular substructure mining , 2005 .

[7]  Mohammed J. Zaki,et al.  Graph mining for discovering infrastructure patterns in configuration management databases , 2012, Knowledge and Information Systems.

[8]  Christos Faloutsos,et al.  PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[9]  Lawrence B. Holder,et al.  Substucture Discovery in the SUBDUE System , 1994, KDD Workshop.

[10]  Christian Borgelt,et al.  Canonical Forms for Frequent Graph Mining , 2006, GfKl.

[11]  Philip S. Yu,et al.  Mining top-K large structural patterns in a massive network , 2011, Proc. VLDB Endow..

[12]  Jimeng Sun,et al.  gbase: an efficient analysis platform for large graphs , 2012, The VLDB Journal.

[13]  Sudarshan S. Chawathe,et al.  SEuS: Structure Extraction Using Summaries , 2002, Discovery Science.