Reducing redundancy in characteristic rule discovery by using integer programming techniques

The discovery of characteristic rules is a well-known data mining task and has lead to several successful applications. However, because of the descriptive nature of characteristic rules, typically a (very) large number of them is discovered during the mining stage. This makes monitoring and control of these rules, in practice, extremely costly and difficult. Therefore, a selection of the most promising subset of rules is desirable. Some heuristic rule selection methods have been proposed in the literature that deal with this issue. In this paper, we propose an integer programming model to solve the problem of optimally selecting the most promising subset of characteristic rules. Moreover, the proposed technique enables to control a user-defined level of overall quality of the model in combination with a maximum reduction of the redundancy extant in the original ruleset. We use real-world data to empirically evaluate the benefits and performance of the proposed technique against the well-known RuleCover heuristic. Results demonstrate that the proposed integer programming techniques are able to significantly reduce the number of retained rules and the level of redundancy in the final ruleset. Moreover, the results demonstrate that the overall quality in terms of the discriminant power of the final ruleset slightly increases if integer programming methods are used.

[1]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[2]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[3]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[4]  Willi Klösgen,et al.  A Support System for Interpreting Statistical Data , 1991, Knowledge Discovery in Databases.

[5]  Heikki Mannila,et al.  Pruning and grouping of discovered association rules , 1995 .

[6]  Frank van Harmelen,et al.  Applying rule-base anomalies to KADS inference structures , 1997, Decis. Support Syst..

[7]  Roberto J. Bayardo Brute-Force Mining of High-Confidence Classification Rules , 1997, KDD.

[8]  Jiawei Han,et al.  Exploration of the power of attribute-oriented induction in data mining , 1995, KDD 1995.

[9]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[10]  Kamal Ali,et al.  Partial Classification Using Association Rules , 1997, KDD.

[11]  Heikki Mannila,et al.  Methods and Problems in Data Mining , 1997, ICDT.

[12]  Rajjan Shinghal,et al.  Evaluating the Interestingness of Characteristic Rules , 1996, KDD.

[13]  Michael J. Rothman,et al.  Applying Data Mining Techniques to a Health Insurance Information System , 1996, VLDB.

[14]  Tom Brijs,et al.  Using association rules in customer satisfaction studies to identify latent dissatisfied customers , 1998 .

[15]  Kenneth A. Kaufman,et al.  Learning in an Inconsistent World: Rule Selection in AQ18 , 1999 .

[16]  Jerzy W. Grzymala-Busse,et al.  LERS-A System for Learning from Examples Based on Rough Sets , 1992, Intelligent Decision Support.

[17]  Heikki Mannila,et al.  Finding interesting rules from large sets of discovered association rules , 1994, CIKM '94.

[18]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[19]  Marzena Kryszkiewicz,et al.  Representative Association Rules and Minimum Condition Maximum Consequence Association Rules , 1998, PKDD.

[20]  Alun D. Preece,et al.  Foundation and application of knowledge base verification , 1994, Int. J. Intell. Syst..

[21]  Jiawei Han,et al.  Data-Driven Discovery of Quantitative Rules in Relational Databases , 1993, IEEE Trans. Knowl. Data Eng..

[22]  Carlos Bento,et al.  A Metric for Selection of the Most Promising Rules , 1998, PKDD.