Pattern set mining with schema-based constraint

Pattern set mining entails discovering groups of frequent itemsets that represent potentially relevant knowledge. Global constraints are commonly enforced to focus the analysis on most interesting pattern sets. However, these constraints evaluate and select each pattern set individually based on its itemset characteristics.This paper extends traditional global constraints by proposing a novel constraint, called schema-based constraint, tailored to relational data. When coping with relational data itemsets consist of sets of items belonging to distinct data attributes, which constitute the itemset schema. The schema-based constraint allows us to effectively combine all the itemsets that are semantically correlated with each other into a unique pattern set, while filtering out those pattern sets covering a mixture of different data facets or giving a partial view of a single facet. Specifically, it selects all the pattern sets that are (i) composed only of frequent itemsets with the same schema and (ii) characterized by maximal size among those corresponding to that schema. Since existing approaches are unable to select one representative pattern set per schema in a single extraction, we propose a new Apriori-based algorithm to efficiently mine pattern sets satisfying the schema-based constraint. The experimental results achieved on both real and synthetic datasets demonstrate the efficiency and effectiveness of our approach.

[1]  Pang-Ning Tan,et al.  Interestingness Measures for Association Patterns : A Perspective , 2000, KDD 2000.

[2]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[3]  Jilles Vreeken,et al.  Krimp: mining itemsets that compress , 2011, Data Mining and Knowledge Discovery.

[4]  Albrecht Zimmermann,et al.  The Chosen Few: On Identifying Valuable Patterns , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[5]  Heikki Mannila,et al.  Approximate Inference of Functional Dependencies from Relations , 1995, Theor. Comput. Sci..

[6]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[7]  Heikki Mannila,et al.  Finding low-entropy sets and trees from binary data , 2007, KDD '07.

[8]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[9]  Luca Cagliero,et al.  Itemset generalization with cardinality-based constraints , 2013, Inf. Sci..

[10]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[11]  Bart Goethals,et al.  Tiling Databases , 2004, Discovery Science.

[12]  Bart Goethals,et al.  Mining frequent conjunctive queries in relational databases through dependency discovery , 2012, Knowledge and Information Systems.

[13]  Nicolas Spyratos,et al.  Mining frequent conjunctive queries using functional and inclusion dependencies , 2013, The VLDB Journal.

[14]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[15]  Luca Cagliero,et al.  EnBay: A Novel Pattern-Based Bayesian Classifier , 2013, IEEE Transactions on Knowledge and Data Engineering.

[16]  Patrice Boizumault,et al.  Constraint Programming for Mining n-ary Patterns , 2010, CP.

[17]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[18]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[19]  David H. Glass,et al.  Confirmation measures of association rule interestingness , 2013, Knowl. Based Syst..

[20]  Johannes Fürnkranz,et al.  From Local Patterns to Global Models: The LeGo Approach to Data Mining , 2008 .

[21]  Arno J. Knobbe,et al.  Pattern Teams , 2006, PKDD.

[22]  Tharam S. Dillon,et al.  Interestingness measures for association rules based on statistical validity , 2011, Knowl. Based Syst..

[23]  Seymour Ginsburg,et al.  Properties of functional-dependency families , 1982, JACM.

[24]  Amedeo Napoli,et al.  The Model of Most Informative Patterns and Its Application to Knowledge Extraction from Graph Databases , 2009, ECML/PKDD.

[25]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[26]  Luc De Raedt,et al.  Evaluating Pattern Set Mining Strategies in a Constraint Programming Framework , 2011, PAKDD.

[27]  Jiawei Han,et al.  Extracting redundancy-aware top-k patterns , 2006, KDD '06.

[28]  Anna M. Manning,et al.  A new algorithm for finding minimal sample uniques for use in statistical disclosure assessment , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[29]  Luca Cagliero,et al.  Generalized association rule mining with constraints , 2012, Inf. Sci..

[30]  Luc De Raedt,et al.  k-Pattern Set Mining under Constraints , 2013, IEEE Transactions on Knowledge and Data Engineering.

[31]  Luca Cagliero,et al.  Improving classification models with taxonomy information , 2013, Data Knowl. Eng..

[32]  Tijl De Bie,et al.  Maximum entropy models and subjective interestingness: an application to tiles in binary databases , 2010, Data Mining and Knowledge Discovery.

[33]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.