Discovering Frequent Graph Patterns Using Disjoint Paths

Whereas data mining in structured data focuses on frequent data values, in semistructured and graph data mining, the issue is frequent labels and common specific topologies. The structure of the data is just as important as its content. We study the problem of discovering typical patterns of graph data, a task made difficult because of the complexity of required subtasks, especially subgraph isomorphism. In this paper, we propose a new apriori-based algorithm for mining graph data, where the basic building blocks are relatively large, disjoint paths. The algorithm is proven to be sound and complete. Empirical evidence shows practical advantages of our approach for certain categories of graphs

[1]  Wei Wang,et al.  An Efficient Algorithm of Frequent Connected Subgraph Extraction , 2003, PAKDD.

[2]  David J. DeWitt,et al.  The design and performance evaluation of alternative XML storage strategies , 2002, SGMD.

[3]  Ehud Gudes,et al.  Computing frequent graph patterns from semistructured data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[4]  Takashi Washio,et al.  State of the art of graph-based data mining , 2003, SKDD.

[5]  Philip S. Yu,et al.  Efficient Data Mining for Path Traversal Patterns , 1998, IEEE Trans. Knowl. Data Eng..

[6]  Ronen I. Brafman,et al.  Preference-Based Configuration of Web Page Content , 2001, IJCAI.

[7]  Yun Chi,et al.  Frequent Subtree Mining - An Overview , 2004, Fundam. Informaticae.

[8]  Hannu Toivonen,et al.  Finding Frequent Substructures in Chemical Compounds , 1998, KDD.

[9]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[10]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[11]  Caroline Haythornthwaite,et al.  Studying Online Social Networks , 2006, J. Comput. Mediat. Commun..

[12]  Ehud Gudes,et al.  Diagonally Subgraphs Pattern Mining , 2004, DMKD '04.

[13]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[14]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[15]  Joost N. Kok,et al.  Frequent graph mining and its application to molecular databases , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[16]  Donald D. Chamberlin,et al.  XQuery: a query language for XML , 2003, SIGMOD '03.

[17]  Vishu Krishnamurthy,et al.  Performance Challenges in Object-Relational DBMSs , 1999, IEEE Data Eng. Bull..

[18]  Amnon Meisels,et al.  Discovering associations in XML data , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering (Workshops), 2002..

[19]  Luc De Raedt,et al.  Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[20]  Ehud Gudes,et al.  Support measures for graph data* , 2006, Data Mining and Knowledge Discovery.

[21]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[22]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[23]  Chen Wang,et al.  Efficient Pattern-Growth Methods for Frequent Tree Pattern Mining , 2004, PAKDD.

[24]  Hiroshi Motoda,et al.  Graph-based induction as a unified learning framework , 1994, Applied Intelligence.

[25]  Kyuseok Shim,et al.  APEX: an adaptive path index for XML data , 2002, SIGMOD '02.

[26]  Nicholas Ayache,et al.  A geometric algorithm to find small but highly similar 3D substructures in proteins , 1998, Bioinform..

[27]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[28]  Ke Wang,et al.  Discovering typical structures of documents: a road map approach , 1998, SIGIR '98.

[29]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[30]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[31]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[32]  S YuPhilip,et al.  Efficient Data Mining for Path Traversal Patterns , 1998 .

[33]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[34]  Donald D. Chamberlin XQuery: An XML query language , 2002, IBM Syst. J..

[35]  Ehud Gudes,et al.  Mining frequent labeled and partially labeled graph patterns , 2004, Proceedings. 20th International Conference on Data Engineering.

[36]  Alin Deutsch,et al.  Storing semistructured data with STORED , 1999, SIGMOD '99.

[37]  Takashi Washio,et al.  Complete Mining of Frequent Patterns from Graphs: Mining Graph Data , 2003, Machine Learning.

[38]  George Karypis,et al.  An efficient algorithm for discovering frequent subgraphs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[39]  Kaizhong Zhang,et al.  Finding Patterns in Three-Dimensional Graphs: Algorithms and Applications to Scientific Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[40]  Alin Deutsch,et al.  Querying XML Data , 1999, IEEE Data Eng. Bull..

[41]  Yanchun Zhang,et al.  Efficiently computing frequent tree-like topology patterns in a Web environment , 1999, Proceedings Technology of Object-Oriented Languages and Systems (Cat. No.PR00393).

[42]  Chen Wang,et al.  Scalable mining of large disk-based graph databases , 2004, KDD.

[43]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[44]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2004, IEEE International Parallel and Distributed Processing Symposium.

[45]  Brendan D. McKay,et al.  Isomorph-Free Exhaustive Generation , 1998, J. Algorithms.

[46]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[47]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994 .