A General Measure of Rule Interestingness

The paper presents a new general measure of rule interestingness. Many known measures such as chi-square, gini gain or entropy gain can be obtained from this measure by setting some numerical parameters, including the amount of trust we have in the estimation of the probability distribution of the data. Moreover, we show that there is a continuum of measures having chi-square, Gini gain and entropy gain as boundary cases. Therefore our measure generalizes both conditional and unconditional classical measures of interestingness. Properties and experimental evaluation of the new measure are also presented.

[1]  Jan Havrda,et al.  Quantification method of classification processes. Concept of structural a-entropy , 1967, Kybernetika.

[2]  I. Csiszár A class of measures of informativity of observation channels , 1972 .

[3]  Robert J. McEliece,et al.  The theory of information and coding : a mathematical framework for communication , 1977 .

[4]  Robert J. McEliece,et al.  The Theory of Information and Coding , 1979 .

[5]  W. J. Thron,et al.  Encyclopedia of Mathematics and its Applications. , 1982 .

[6]  Tarald O. Kvålseth,et al.  Entropy and Correlation: Some Comments , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[7]  Philip A. Chou,et al.  Optimal Partitioning for Classification and Regression Trees , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  J. N. Kapur,et al.  Entropy optimization principles with applications , 1992 .

[9]  Dan A. Simovici,et al.  Relational Database Systems , 1995 .

[10]  L. Wehenkel On uncertainty measures used for decision tree induction , 1996 .

[11]  MotwaniRajeev,et al.  Beyond market baskets , 1997 .

[12]  Yasuhiko Morimoto,et al.  Algorithms for Mining Association Rules for Binary Segmentations of Huge Categorical Databases , 1998, VLDB.

[13]  Shinichi Morishita,et al.  On Classification and Regression , 1998, Discovery Science.

[14]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[15]  Balaji Padmanabhan,et al.  Unexpectedness as a Measure of Interestingness in Knowledge Discovery , 1999, Decis. Support Syst..

[16]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[17]  Roberto J. Bayardo,et al.  Mining the most interesting rules , 1999, KDD '99.

[18]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.