Closed Frequent Itemset Mining with Arbitrary Side Constraints

Frequent itemset mining (FIM) is a method for finding regularities in transaction databases. It has several application areas, such as market basket analysis, genome analysis, and drug design. Finding frequent itemsets allows further analysis to focus on a small subset of the data. For large datasets the number of frequent itemsets can also be very large, defeating their purpose. Therefore, several extensions to FIM have been studied, such as adding high-utility (or low-cost) constraints and only finding closed (or maximal) frequent itemsets. This paper presents a constraint programming based approach that combines arbitrary side constraints with closed frequent itemset mining. Our approach allows arbitrary side constraints to be expressed in a high level and declarative language which is then translated automatically for efficient solution by a SAT solver. We compare our approach with state-of-the-art algorithms via the MiningZinc system (where possible) and show significant contributions in terms of performance and applicability.

[1]  Ian P. Gent,et al.  Automatically Improving Constraint Models in Savile Row through Associative-Commutative Common Subexpression Elimination , 2014, CP.

[2]  Ian P. Gent,et al.  Automatic Discovery and Exploitation of Promising Subproblems for Tabulation , 2018, CP.

[3]  Brahim Hnich,et al.  Extensible Automated Constraint Modelling , 2011, AAAI.

[4]  Takehide Soh,et al.  Implementing Efficient All Solutions SAT Solvers , 2015, ACM J. Exp. Algorithmics.

[5]  Ian P. Gent,et al.  Breaking Conditional Symmetry in Automated Constraint Modelling with CONJURE , 2014, ECAI.

[6]  Mohammed J. Zaki,et al.  Efficiently mining maximal frequent itemsets , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[7]  Hiroki Arimura,et al.  LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets , 2004, FIMI.

[8]  Luc De Raedt,et al.  Constraint programming for itemset mining , 2008, KDD.

[9]  Ozgur Akgun Extensible automated constraint modelling via refinement of abstract problem specifications , 2016, Constraints.

[10]  Ian Miguel,et al.  Automatic Generation and Selection of Streamlined Constraint Models via Monte Carlo Search on a Model Lattice , 2018, CP.

[11]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[12]  Sharad Malik,et al.  Chaff: engineering an efficient SAT solver , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[13]  Bart Goethals,et al.  A primer to frequent itemset mining for bioinformatics , 2013, Briefings Bioinform..

[14]  Ian P. Gent,et al.  Automatically improving constraint models in Savile Row , 2017, Artif. Intell..

[15]  Benjamin W. Wah,et al.  A discrete Lagrangian-based global-search method for solving satisfiability problems , 1996, Satisfiability Problem: Theory and Applications.

[16]  Ian P. Gent,et al.  Minion: A Fast Scalable Constraint Solver , 2006, ECAI.

[17]  Philippe Fournier-Viger,et al.  A survey of itemset mining , 2017, WIREs Data Mining Knowl. Discov..

[18]  Bilal Syed Hussain,et al.  Automated Symmetry Breaking and Model Selection in Conjure , 2013, CP.

[19]  Ian Miguel,et al.  Automatically Improving SAT Encoding of Constraint Problems Through Common Subexpression Elimination in Savile Row , 2015, CP.

[20]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[21]  Anton Dries,et al.  Dominance Programming for Itemset Mining , 2013, 2013 IEEE 13th International Conference on Data Mining.

[22]  Benjamin W. Wah,et al.  A Discrete Lagrangian-Based Global-Search Method for Solving Satisfiability Problems , 1996, J. Glob. Optim..

[23]  Francesco Bonchi,et al.  On closed constrained frequent pattern mining , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[24]  Ian P. Gent,et al.  A Framework for Constraint Based Local Search using Essence , 2018, IJCAI.

[25]  Stephen P. Schwartz,et al.  The Essence of Essence , 2009 .

[26]  Luc De Raedt,et al.  MiningZinc: A declarative framework for constraint-based mining , 2017, Artif. Intell..

[27]  Warwick Harvey,et al.  Essence: A constraint language for specifying combinatorial problems , 2007, Constraints.

[28]  Christian Borgelt,et al.  EFFICIENT IMPLEMENTATIONS OF APRIORI AND ECLAT , 2003 .

[29]  Joao Marques-Silva Practical applications of Boolean Satisfiability , 2008, 2008 9th International Workshop on Discrete Event Systems.