Signature Pattern Covering via Local Greedy Algorithm and Pattern Shrink

Pattern mining is a fundamental problem that has a wide range of applications. In this paper, we study the problem of finding a minimum set of signature patterns that explain all data. In the problem, we are given objects where each object has an item set and a label. A pattern is called a signature pattern if all objects with the pattern have the same label. This problem has many interesting applications such as assertion mining in hardware design and identifying failure causes from various log data. We show that the previous pattern mining methods are not suitable for mining signature patterns and identify the problems. Then we propose a novel pattern enumeration method which we call Pattern Shrink. Our method is strongly coupled with another novel method that is very similar to finding a local optimum with a negligible loss in performance. Our proposed methods show a speedup of more than ten times over the previous methods. Our methods are flexible enough to be extended to mining high confidence patterns, instead of signature patterns.

[1]  Jiawei Han,et al.  Dustminer: troubleshooting interactive complexity bugs in sensor networks , 2008, SenSys '08.

[2]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[3]  Luc De Raedt,et al.  Correlated itemset mining in ROC space: a constraint programming approach , 2009, KDD.

[4]  BouléMarc,et al.  Automata-based assertion-checker synthesis of PSL properties , 2008 .

[5]  Philip S. Yu,et al.  Direct Discriminative Pattern Mining for Effective Classification , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[6]  Nicole Krämer,et al.  Partial least squares regression for graph mining , 2008, KDD.

[7]  Andreas Zeller,et al.  Simplifying and Isolating Failure-Inducing Input , 2002, IEEE Trans. Software Eng..

[8]  Zeljko Zilic,et al.  Assertion Checkers in Verification, Silicon Debug and In-Field Diagnosis , 2007, 8th International Symposium on Quality Electronic Design (ISQED'07).

[9]  Mohammed J. Zaki Efficiently mining frequent trees in a forest: algorithms and applications , 2005, IEEE Transactions on Knowledge and Data Engineering.

[10]  Aarti Gupta Assertion-based verification turns the corner , 2002, IEEE Des. Test Comput..

[11]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[12]  James Bailey,et al.  Mining Minimal Contrast Subgraph Patterns , 2006, SDM.

[13]  H. Cleve,et al.  Locating causes of program failures , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[14]  David Tcheng,et al.  GoldMine: Automatic assertion generation using data mining and static analysis , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[15]  Sebastian Nowozin,et al.  gBoost: a mathematical programming approach to graph classification and regression , 2009, Machine Learning.

[16]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[17]  Sanjay J. Patel,et al.  Rigel: an architecture and scalable programming interface for a 1000-core accelerator , 2009, ISCA '09.

[18]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[19]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[20]  Philip S. Yu,et al.  Mining significant graph patterns by leap search , 2008, SIGMOD Conference.

[21]  Philip S. Yu,et al.  Direct mining of discriminative and essential frequent patterns via model-based search tree , 2008, KDD.

[22]  Kotagiri Ramamohanarao,et al.  The Space of Jumping Emerging Patterns and Its Incremental Maintenance Algorithms , 2000, ICML.

[23]  Siegfried Nijssen,et al.  Pattern-Based Classification: A Unifying Perspective , 2011, ArXiv.

[24]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[25]  Sangkyum Kim,et al.  NDPMine: Efficiently Mining Discriminative Numerical Features for Pattern-Based Classification , 2010, ECML/PKDD.

[26]  Harry D. Foster,et al.  Assertion-Based Design , 2010 .