On the Usefulness of Weight-Based Constraints in Frequent Subgraph Mining

Frequent subgraph mining is an important data-mining technique. In this paper we look at weighted graphs, which are ubiquitous in the real world. The analysis of weights in combination with mining for substructures might yield more precise results. In particular, we study frequent subgraph mining in the presence of weight-based constraints and explain how to integrate them into mining algorithms. While such constraints only yield approximate mining results in most cases, we demonstrate that such results are useful nevertheless and explain this effect. To do so, we both assess the completeness of the approximate result sets, and we carry out application-oriented studies with real-world data-analysis problems: software-defect localization and explorative mining in transportation logistics. Our results are that the runtime can improve by a factor of up to 3.5 in defect localization and 7 in explorative mining. At the same time, we obtain an even slightly increased defect-localization precision and obtain good explorative mining results.

[1]  Ian F. Darwin Java Cookbook , 2001 .

[2]  Yuji Matsumoto,et al.  An Application of Boosting to Graph Classification , 2004, NIPS.

[3]  Klemens Böhm,et al.  Mining Edge-Weighted Call Graphs to Localise Software Bugs , 2008, ECML/PKDD.

[4]  Jiawei Han,et al.  Discovery of Frequent Substructures , 2006 .

[5]  Kyuseok Shim,et al.  SPIRIT: Sequential Pattern Mining with Regular Expression Constraints , 1999, VLDB.

[6]  Philip S. Yu,et al.  Feature-based similarity search in graph structures , 2006, TODS.

[7]  Chris Clifton,et al.  Knowledge discovery from transportation network data , 2005, 21st International Conference on Data Engineering (ICDE'05).

[8]  Mohammad Al Hasan,et al.  ORIGAMI: Mining Representative Orthogonal Graph Patterns , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[9]  Christian Borgelt,et al.  A Decision Tree Plug-In for DataEngine , 2004 .

[10]  Takashi Washio,et al.  Complete Mining of Frequent Patterns from Graphs: Mining Graph Data , 2003, Machine Learning.

[11]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[12]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[13]  Nicole Krämer,et al.  Partial least squares regression for graph mining , 2008, KDD.

[14]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[15]  Laks V. S. Lakshmanan,et al.  Pushing Convertible Constraints in Frequent Itemset Mining , 2004, Data Mining and Knowledge Discovery.

[16]  Klemens Böhm,et al.  Software-Bug Localization with Graph Mining , 2010, Managing and Mining Graph Data.

[17]  Philip S. Yu,et al.  Mining significant graph patterns by leap search , 2008, SIGMOD Conference.

[18]  Frans Coenen,et al.  Text classification using graph mining-based feature extraction , 2010 .

[19]  Philip S. Yu,et al.  gPrune: A Constraint Pushing Framework for Graph Pattern Mining , 2007, PAKDD.

[20]  Chen Wang,et al.  Constraint-Based Graph Mining in Large Database , 2005, APWeb.

[21]  Andreas Zeller,et al.  Why Programs Fail: A Guide to Systematic Debugging , 2005 .

[22]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[23]  Ian Witten,et al.  Data Mining , 2000 .

[24]  Ambuj K. Singh,et al.  GraphSig: A Scalable Approach to Mining Significant Subgraphs in Large Graph Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[25]  Sebastian Nowozin,et al.  Weighted Substructure Mining for Image Analysis , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[27]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[28]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[29]  Jian Pei,et al.  Mining sequential patterns with constraints in large databases , 2002, CIKM '02.

[30]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.