A Label-Based Partitioning Strategy for Mining Link Patterns

As the explosive growth of online linked data, the task of mining link patterns attracts more and more attention. A practical issue is how to perform mining efficiently in large-scale linked data. Existing pattern mining algorithms usually assume that the dataset can fit into the main memory, while linked data of billion triples is far beyond the memory limitation. In this paper we give a pilot study of a novel partitioning strategy for mining link patterns in large-scale linked data. First, we propose an algorithm named Par Group to divide and group large linked data to partitions based on vertex label, Second, an adapted gSpan is applied for mining link patterns in each partition, At last, discovered link patterns are merged into a global result set. Experiments show that our strategy is feasible and promising in some scenarios.

[1]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[2]  Xiang Zhang,et al.  Mining Link Patterns in Linked Data , 2012, WAIM.

[3]  Chen Wang,et al.  Scalable mining of large disk-based graph databases , 2004, KDD.

[4]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[5]  Vipin Kumar,et al.  Multilevel Algorithms for Multi-Constraint Graph Partitioning , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[6]  Isabelle Mirbel,et al.  DFS-based frequent graph pattern extraction to characterize the content of RDF Triple Stores , 2010 .

[7]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[8]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[9]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[10]  Lawrence B. Holder,et al.  Substucture Discovery in the SUBDUE System , 1994, KDD Workshop.

[11]  Mong-Li Lee,et al.  A Partition-Based Approach to Graph Mining , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[12]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[13]  Amit P. Sheth,et al.  Semantic Association Identification and Knowledge Discovery for National Security Applications , 2005, J. Database Manag..

[14]  Jiong Yang,et al.  SPIN: mining maximal frequent subgraphs from graph databases , 2004, KDD.

[15]  Maria E. Orlowska,et al.  Graph Mining based on a Data Partitioning Approach , 2008, ADC.