Discovery of multiple-level rules from large databases

With the widespread computerization in business, government, and science, the efficient and effective discovery of interesting information from large databases becomes essential. Data mining or Knowledge Discovery in Database (KDD) emerges as a solution to the data analysis problems faced by many organizations. Previous studies on data mining have been focused on the discovery of knowledge at a single conceptual level, either at the primitive level or at a rather high conceptual level. However, it is often desirable to discover knowledge at multiple conceptual levels, which will provide a spectrum of understanding, from general to specific, for the underlying data. In this thesis, we first introduce the conceptual hierarchy, a hierarchical organization of the data in the databases. Two algorithms for dynamic adjustment of conceptual hierarchies are developed, as well as another algorithm for automatic generation of conceptual hierarchies for numerical attributes. In addition, a set of algorithms is developed for mining multiple-level characteristic, discriminant and association rules. All algorithms developed were implemented and tested in our data mining prototype system, DBMiner. The attribute-oriented induction method is extended to discover multiple-level characteristic and discriminant rules. A progressive deepening method is proposed for mining multiple-level association rules. Several variants of the method with different optimization techniques are implemented and tested. The results show the method is efficient and effective. Furthermore, a new approach to association rule mining, meta-rule guided mining, is proposed. The experiments show that meta-rule guided mining is powerful and efficient. Finally, an application of data mining techniques, cooperative query answering using multiple layered databases, is presented. Our study concludes that mining knowledge at multiple levels is both practical and desirable, and thus is an interesting research direction. Some future research problems are also discussed.

[1]  Padhraic Smyth,et al.  Rule Induction Using Information Theory , 1991, Knowledge Discovery in Databases.

[2]  Gregory Piatetsky-Shapiro,et al.  An Application of KEFM to the Analysis of Healthcare Information , 1994, KDD Workshop.

[3]  Sunita Sarawagi,et al.  Modeling multidimensional databases , 1997, Proceedings 13th International Conference on Data Engineering.

[4]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[5]  W. Scott Spangler,et al.  Learning Useful Rules from Inconclusive Data , 1991, Knowledge Discovery in Databases.

[6]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[7]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[8]  Evangelos Simoudis,et al.  Using Recon for Data Cleaning , 1995, KDD.

[9]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[10]  Jiawei Han,et al.  Resource and Knowledge Discovery in Global Information Systems: A Scalable Multiple Layered Database Approach , 1995 .

[11]  Jiawei Han,et al.  Exploration of the power of attribute-oriented induction in data mining , 1995, KDD 1995.

[12]  Heikki Mannila,et al.  Dependency Inference , 1987, VLDB.

[13]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[14]  Michael R. Genesereth,et al.  Logical foundations of artificial intelligence , 1987 .

[15]  Rajjan Shinghal,et al.  Evaluating the Interestingness of Characteristic Rules , 1996, KDD.

[16]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[17]  Rohit Jain,et al.  Efficient Search of Multi-Dimensional B-Trees , 1995, VLDB.

[18]  Jiawei Han,et al.  Concept-Based Data Classification in Relational Databases † , 1991 .

[19]  Philip K. Chan,et al.  Systems for Knowledge Discovery in Databases , 1993, IEEE Trans. Knowl. Data Eng..

[20]  Jie Cheng,et al.  Improved Decision Trees: A Generalized Version of ID3 , 1988, ML.

[21]  Frédéric Cuppens,et al.  Cooperative Answering: A Methodology to Provide Intelligent Access to databases , 1988, Expert Database Conf..

[22]  Tomasz Imielinski,et al.  DataMine—interactive rule discovery system , 1995, SIGMOD '95.

[23]  Jennifer Widom,et al.  View maintenance in a warehousing environment , 1995, SIGMOD '95.

[24]  Carlo Zaniolo,et al.  Metaqueries for Data Mining , 1996, Advances in Knowledge Discovery and Data Mining.

[25]  Rokia Missaoui,et al.  An Incremental Concept Formation Approach for Learning from Databases , 1994, Theor. Comput. Sci..

[26]  Hans-Peter Kriegel,et al.  Supporting data mining of large databases by visual feedback queries , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[27]  Xiaohua Hu,et al.  Rough Sets Similarity-Based Learning from Databases , 1995, KDD.

[28]  Diane C. P. Smith,et al.  Database abstractions: aggregation and generalization , 1977, TODS.

[29]  Frédéric Cuppens,et al.  Extending answers to neighbour entities in a cooperative answering context , 1991, Decis. Support Syst..

[30]  Ramakrishnan Srikant,et al.  The Quest Data Mining System , 1996, KDD.

[31]  Kenneth A. Ross,et al.  Materialized view maintenance and integrity constraint checking: trading space for time , 1996, SIGMOD '96.

[32]  Ryszard S. Michalski,et al.  Automated Construction of Classifications: Conceptual Clustering Versus Numerical Taxonomy , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Abraham Silberschatz,et al.  On Subjective Measures of Interestingness in Knowledge Discovery , 1995, KDD.

[34]  Qiming Chen,et al.  Cooperative Query Answering via Type Abstraction Hierarchy , 1991 .

[35]  Ryszard S. Michalski,et al.  A theory and methodology of inductive learning , 1993 .

[36]  Wojciech Ziarko,et al.  The Discovery, Analysis, and Representation of Data Dependencies in Databases , 1991, Knowledge Discovery in Databases.

[37]  Brian R. Gaines,et al.  Exception Dags as Knowledge Structures , 1994, KDD Workshop.

[38]  Tom M. Mitchell,et al.  An Analysis of Generalization as a Search Problem , 1979, IJCAI.

[39]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[40]  Wray L. BuntineRIACS Theory Reenement on Bayesian Networks , 1991 .

[41]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[42]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[43]  Michael Stonebraker,et al.  Database systems: achievements and opportunities , 1990, SGMD.

[44]  Terry Gaasterland,et al.  Restricting query relaxation through user constraints , 1993, [1993] Proceedings International Conference on Intelligent and Cooperative Information Systems.

[45]  Jan M. Zytkow,et al.  Interactive Mining of Regularities in Databases , 1991, Knowledge Discovery in Databases.

[46]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[47]  Jiawei Han,et al.  Cooperative Query Answering Using Multiple Layered Databases , 1994, CoopIS.

[48]  Larry Kerschberg,et al.  Mining for Knowledge in Databases: Goals and General Description of the INLEN System , 1989, Knowledge Discovery in Databases.

[49]  Usama M. Fayyad,et al.  Automating the Analysis and Cataloging of Sky Surveys , 1996, Advances in Knowledge Discovery and Data Mining.

[50]  Veronica Dahl,et al.  Reasoning with taxonomies , 1996 .

[51]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[52]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[53]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[54]  Tomasz Imielinski,et al.  Intelligent Query Answering in Rule Based Systems , 1988, J. Log. Program..

[55]  Toncan Duong,et al.  Modelling the real world by Multi-World data model , 1993, [1993] Proceedings International Conference on Intelligent and Cooperative Information Systems.

[56]  Brad Perry,et al.  Applying a Data Miner To Heterogeneous Schema Integration , 1995, KDD.

[57]  Fuwen Gao,et al.  Interactive image query system using progressive transmission , 1983, SIGGRAPH.

[58]  Tom M. Mitchell,et al.  Version Spaces: A Candidate Elimination Approach to Rule Learning , 1977, IJCAI.

[59]  James Kelly,et al.  AutoClass: A Bayesian Classification System , 1993, ML.

[60]  Jiawei Han,et al.  Data-Driven Discovery of Quantitative Rules in Relational Databases , 1993, IEEE Trans. Knowl. Data Eng..

[61]  Michel Manago,et al.  Induction of Decision Trees from Complex Structured Data , 1991, Knowledge Discovery in Databases.

[62]  Jiawei Han,et al.  Attribute-Oriented Induction in Relational Databases , 1991, Knowledge Discovery in Databases.

[63]  Deborah L. McGuinness,et al.  Integrated Support for Data Archeology , 1993, Int. J. Cooperative Inf. Syst..

[64]  M. F. Wolf Successful integration of databases, knowledge-based systems, and human judgement , 1993, [1993] Proceedings International Conference on Intelligent and Cooperative Information Systems.

[65]  J. Ross Quinlan,et al.  Learning Efficient Classification Procedures and Their Application to Chess End Games , 1983 .

[66]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[67]  A. Fall Sparse logical terms , 1995 .

[68]  Hongjun Lu,et al.  NeuroRule: A Connectionist Approach to Data Mining , 1995, VLDB.

[69]  J. Hong,et al.  Incremental Discovery of Rules and Structure by Hierarchical and Parallel Clustering , 1991, Knowledge Discovery in Databases.

[70]  Richard R. Muntz,et al.  An Information-Theoretic Study on Aggregate Responses , 1988, VLDB.

[71]  Jiawei Han,et al.  DBMiner: A System for Mining Knowledge in Large Relational Databases , 1996, KDD.

[72]  Jiawei Han,et al.  Meta-Rule-Guided Mining of Association Rules in Relational Databases , 1995, KDOOD/TDOOD.

[73]  Rohit Jain,et al.  Efficient Search of Multidimensional B-Trees , 1998 .

[74]  Ashish Gupta,et al.  Aggregate-Query Processing in Data Warehousing Environments , 1995, VLDB.

[75]  Gregory Piatetsky-Shapiro,et al.  The interestingness of deviations , 1994 .

[76]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[77]  J. R. Quinlan Discovering rules by induction from large collections of examples Intro-ductory readings in expert s , 1979 .

[78]  Michael Stonebraker,et al.  Database research: achievements and opportunities into the 1st century , 1996, SGMD.

[79]  David Heckerman,et al.  Bayesian Networks for Knowledge Discovery , 1996, Advances in Knowledge Discovery and Data Mining.

[80]  Andrew K. C. Wong,et al.  Information Discovery through Hierarchical Maximum Entropy Discretization and Synthesis , 1991, Knowledge Discovery in Databases.

[81]  George H. John Robust Decision Trees: Removing Outliers from Databases , 1995, KDD.

[82]  Venky Harinarayan,et al.  Implementing Data Cubes E ciently , 1996 .

[83]  Heikki Mannila,et al.  Finding interesting rules from large sets of discovered association rules , 1994, CIKM '94.

[84]  T. J. Teorey,et al.  A logical design methodology for relational databases using the extended entity-relationship model , 1986, CSUR.

[85]  Usama M. Fayyad,et al.  Knowledge Discovery in Databases: An Overview , 1997, ILP.

[86]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[87]  Abraham Silberschatz,et al.  A Multi-Resolution Relational Data Model , 1992, VLDB.

[88]  Jiawei Han,et al.  Knowledge Discovery in Databases: An Attribute-Oriented Approach , 1992, VLDB.

[89]  Douglas H. Fisher,et al.  Improving Inference through Conceptual Clustering , 1987, AAAI.

[90]  Ronald J. Brachman,et al.  The Process of Knowledge Discovery in Databases: A First Sketch , 1994, KDD Workshop.

[91]  Wray L. Buntine,et al.  Graphical models for discovering knowledge , 1996, KDD 1996.

[92]  M. Pazzani,et al.  Concept formation knowledge and experience in unsupervised learning , 1991 .

[93]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[94]  Alexander Borgida,et al.  Loading data into description reasoners , 1993, SIGMOD Conference.

[95]  R. Wille Concept lattices and conceptual knowledge systems , 1992 .

[96]  C. Wittemann,et al.  Intelligent assistance in flexible decisions , 1993, [1993] Proceedings International Conference on Intelligent and Cooperative Information Systems.

[97]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.