论文信息 - Heuristics for Ranking the Interestingnessof Discovered

Heuristics for Ranking the Interestingnessof Discovered

We describe heuristics, based upon information theory and statistics, for ranking the interestingness of summaries generated from databases. The tuples in a summary are unique, and therefore, can be considered to be a population described by some probability distribution. The four interestingness measures presented here are based upon common measures of diversity of a population: variance, the Simpson index, and the Shannon index. Using each of the proposed measures, we assign a single real value to a summary that describes its interesting-ness. Our experimental results show that the ranks assigned by the four interestingness measures are highly correlated.

Howard J. HamiltonDepartment | KnowledgeRobert J. Hilderman

[1] Hongjun Lu,et al. Identifying Relevant Databases for Multidatabase Mining , 1998, PAKDD.

[2] Nick Cercone,et al. Parallel Knowledge Discovery Using Domain Generalization Graphs , 1997, PKDD.

[3] Nick Cercone,et al. Share Based Measures for Itemsets , 1997, PKDD.

[4] Nick Cercone,et al. Mining Market Basket Data Using Share Measures and Characterized Itemsets , 1998, PAKDD.

[5] Usama M. Fayyad,et al. Knowledge Discovery in Databases: An Overview , 1997, ILP.

[6] W. Bossert,et al. The Measurement of Diversity , 2001 .

[7] Howard J. Hamilton,et al. Temporal Generalization with Domain Generalization Graphs , 1999, Int. J. Pattern Recognit. Artif. Intell..

[8] Winson Taam. Introduction to Probability and Statistics for Scientists and Engineers , 1999, Technometrics.

[9] Howard J. Hamilton,et al. Generalization Lattices , 1998, PKDD.

[10] Howard J. Hamilton,et al. Ranking the Interestingness of Summaries from Data Mining Systems , 1999, FLAIRS.