Bidirectional mining of non-redundant recurrent rules from a sequence database

We are interested in scalable mining of a non-redundant set of significant recurrent rules from a sequence database. Recurrent rules have the form “whenever a series of precedent events occurs, eventually a series of consequent events occurs”. They are intuitive and characterize behaviors in many domains. An example is the domain of software specification, in which the rules capture a family of properties beneficial to program verification and bug detection. We enhance a past work on mining recurrent rules by Lo, Khoo, and Liu to perform mining more scalably. We propose a new set of pruning properties embedded in a new mining algorithm. Performance and case studies on benchmark synthetic and real datasets show that our approach is much more efficient and outperforms the state-of-the-art approach in mining recurrent rules by up to two orders of magnitude.

[1]  Rafael Capilla,et al.  Light-weight product-lines for evolution and maintenance of Web sites , 2003, Seventh European Conference onSoftware Maintenance and Reengineering, 2003. Proceedings..

[2]  Chao Liu,et al.  Efficient mining of iterative patterns for software specification discovery , 2007, KDD '07.

[3]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[4]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[5]  David Lo,et al.  Mining Quantified Temporal Rules: Formalism, Algorithms, and Evaluation , 2009, 2009 16th Working Conference on Reverse Engineering.

[6]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[7]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[8]  Edmund M. Clarke,et al.  Model Checking , 1999, Handbook of Automated Reasoning.

[9]  Myra Spiliopoulou,et al.  Managing Interesting Rules in Sequence Mining , 1999, PKDD.

[10]  Marco Sinnema,et al.  Experiences in Software Product Families: Problems and Issues During Product Derivation , 2004, SPLC.

[11]  Shichao Zhang,et al.  A Temporal Logic for Supporting Historical Databases , 2000, Knowledge and Information Systems.

[12]  Gemma C. Garriga Discovering Unbounded Episodes in Sequential Data , 2003, PKDD.

[13]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[14]  Radu Mateescu,et al.  Temporal logic patterns for querying dynamic models of cellular interaction networks , 2008, ECCB.

[15]  Jitender S. Deogun,et al.  Discovering Sequential Association Rules with Constraints and Time Lags in Multiple Sequences , 2002, ISMIS.

[16]  Jiawei Han,et al.  Efficient Mining of Closed Repetitive Gapped Subsequences from a Sequence Database , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[17]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[18]  Mohammed J. Zaki Mining Non-Redundant Association Rules , 2004, Data Min. Knowl. Discov..

[19]  Wojciech Jamroga A Temporal Logic for Stochastic Multi-Agent Systems , 2008, PRIMA.

[20]  Marcelo Arenas,et al.  Combining Temporal Logics for Querying XML Documents , 2007, ICDT.

[21]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[22]  Chao Liu,et al.  Efficient Mining of Recurrent Rules from a Sequence Database , 2008, DASFAA.

[23]  Amir Pnueli,et al.  Temporal Logic for Scenario-Based Specifications , 2005, TACAS.

[24]  George S. Avrunin,et al.  Patterns in property specifications for finite-state verification , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[25]  Surajit Ray,et al.  Sequence Pattern Discovery with Applications to Understanding Gene Regulation and Vaccine Design , 2012 .

[26]  Thomas J. Ostrand,et al.  Experiments on the effectiveness of dataflow- and control-flow-based test adequacy criteria , 1994, Proceedings of 16th International Conference on Software Engineering.