Novelty Framework for Knowledge Discovery in Databases

Knowledge Discovery in Databases (KDD) is an iterative process that aims at extracting interesting, previously unknown and hidden patterns from huge databases. Use of objective measures of interestingness in popular data mining algorithms often leads to another data mining problem, although of reduced complexity. The reduction in the volume of the discovered rules is desirable in order to improve the efficiency of the overall KDD process. Subjective measures of interestingness are required to achieve this. In this paper we study novelty of the discovered rules as a subjective measure of interestingness. We propose a framework to quantify novelty of the discovered rules in terms of their deviations from the known rules. The computations are carried out using the importance that the user gives to different deviations. The computed degree of novelty is then compared with the user given threshold to report novel rules to the user. We implement the proposed framework and experiment with some public datasets. The experimental results are quite promising.

[1]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[2]  David G. Stork,et al.  Pattern Classification , 1973 .

[3]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[4]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[5]  Yoshikiyo Kato,et al.  Fault Detection by Mining Association Rules from House-keeping Data , 2001 .

[6]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[7]  Wynne Hsu,et al.  Using General Impressions to Analyze Discovered Classification Rules , 1997, KDD.

[8]  Wynne Hsu,et al.  Finding Interesting Patterns Using User Expectations , 1999, IEEE Trans. Knowl. Data Eng..

[9]  Giuseppe Psaila Discovery of Association Rule Meta-Patterns , 1999, DaWaK.

[10]  Wynne Hsu,et al.  Post-Analysis of Learned Rules , 1996, AAAI/IAAI, Vol. 1.

[11]  Balaji Padmanabhan,et al.  Unexpectedness as a Measure of Interestingness in Knowledge Discovery , 1999, Decis. Support Syst..

[12]  Sugato Basu and Raymond J. Mooney and Krupakar V. Pasupul Ghosh Using Lexical Knowlege to Evaluate the Novelty of Rules Mined from Text , 2001 .

[13]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[14]  A. K. Pujari,et al.  Data Mining Techniques , 2006 .

[15]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[16]  Dorian Pyle,et al.  Data Preparation for Data Mining , 1999 .

[17]  Graham J. Williams Evolutionary Hot Spots Data Mining - An Architecture for Exploring for Interesting Discoveries , 1999, PAKDD.

[18]  Abraham Silberschatz,et al.  On Subjective Measures of Interestingness in Knowledge Discovery , 1995, KDD.

[19]  Margaret H. Dunham,et al.  Data Mining: Introductory and Advanced Topics , 2002 .

[20]  Abraham Silberschatz,et al.  What Makes Patterns Interesting in Knowledge Discovery Systems , 1996, IEEE Trans. Knowl. Data Eng..

[21]  Gregory Piatetsky-Shapiro,et al.  The interestingness of deviations , 1994 .