Efficient Rule-Based Attribute-Oriented Induction for Data Mining

Data mining has become an important technique which has tremendous potential in many commercial and industrial applications. Attribute-oriented induction is a powerful mining technique and has been successfully implemented in the data mining system DBMiner (Han et al. Proc. 1996 Int'l Conf. on Data Mining and Knowledge Discovery (KDD'96), Portland, Oregon, 1996). However, its induction capability is limited by the unconditional concept generalization. In this paper, we extend the concept generalization to rule-based concept hierarchy, which enhances greatly its induction power. When previously proposed induction algorithm is applied to the more general rule-based case, a problem of induction anomaly occurs which impacts its efficiency. We have developed an efficient algorithm to facilitate induction on the rule-based case which can avoid the anomaly. Performance studies have shown that the algorithm is superior than a previously proposed algorithm based on backtracking.

[1]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[2]  Jiawei Han,et al.  Knowledge Discovery in Databases: An Attribute-Oriented Approach , 1992, VLDB.

[3]  Douglas H. Fisher,et al.  Improving Inference through Conceptual Clustering , 1987, AAAI.

[4]  Venky Harinarayan,et al.  Implementing Data Cubes E ciently , 1996 .

[5]  Jiawei Han,et al.  A Case-Based Reasoning Approach for Associative Query Answering , 1994, ISMIS.

[6]  Jeffrey F. Naughton,et al.  On the Computation of Multidimensional Aggregates , 1996, VLDB.

[7]  David Haussler,et al.  Learning Conjunctive Concepts in Structural Domains , 1989, Machine Learning.

[8]  Michael Stonebraker,et al.  Database research: achievements and opportunities into the 1st century , 1996, SGMD.

[9]  Jeffrey D. Ullman,et al.  Principles Of Database And Knowledge-Base Systems , 1979 .

[10]  Ryszard S. Michalski,et al.  A theory and methodology of inductive learning , 1993 .

[11]  Jan M. Zytkow,et al.  Interactive Mining of Regularities in Databases , 1991, Knowledge Discovery in Databases.

[12]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[13]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[14]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[15]  Jack Minker,et al.  Logic and Databases: A Deductive Approach , 1984, CSUR.

[16]  Jeffrey D. Uuman Principles of database and knowledge- base systems , 1989 .

[17]  Tomasz Imielinski,et al.  An Interval Classifier for Database Mining Applications , 1992, VLDB.

[18]  Stefano Ceri,et al.  On Intelligent and Cooperative Information Systems: A Workshop Summary , 1992, Int. J. Cooperative Inf. Syst..

[19]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[20]  Jiawei Han,et al.  Exploration of the power of attribute-oriented induction in data mining , 1995, KDD 1995.

[21]  Jiawei Han,et al.  Maintenance of discovered association rules in large databases: an incremental updating technique , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[22]  Jennifer Widom,et al.  Research problems in data warehousing , 1995, CIKM '95.

[23]  Larry Kerschberg,et al.  Mining for Knowledge in Databases: Goals and General Description of the INLEN System , 1989, Knowledge Discovery in Databases.

[24]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[25]  Jiawei Han,et al.  DBMiner: A System for Mining Knowledge in Large Relational Databases , 1996, KDD.

[26]  Jiawei Han,et al.  Knowledge discovery in databases: A rule-based attribute-oriented approach , 1994 .

[27]  Ada Wai-Chee Fu,et al.  Efficient Algorithms for Attribute-Oriented Induction , 1995, KDD.

[28]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[29]  Jiawei Han,et al.  Data-Driven Discovery of Quantitative Rules in Relational Databases , 1993, IEEE Trans. Knowl. Data Eng..

[30]  Gregory Piatetsky-Shapiro,et al.  Knowledge Discovery in Databases: An Overview , 1992, AI Mag..

[31]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[32]  Jiawei Han,et al.  A fast distributed algorithm for mining association rules , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[33]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.