Concept-Based Data Classification in Relational Databases †

Data classification is a process which groups objects with common properties into classes and produces a classification scheme over a set of data objects. Data classification is useful for understanding and organizing database data and building hierarchical schemes in databases. We investigate data classification in relational databases and develop a method for data classification by concept-based generalization. Our method applies an attribute-oriented generalization technique which utilizes the knowledge about data concepts, integrates a data classification process with relational operations, and provides an efficient way for classification of data in relational databases. The characteristics of each class can be extracted automatically in the classification process. Moreover, quantitative information can be registered in the generalization process to assist the classification of data based on database statistics. Our analysis of the classification algorithms shows that the attribute-oriented approach substantially reduces the complexity of data classification in large databases.

[1]  Thomas G. Dietterich,et al.  A Comparative Review of Selected Methods for Learning from Examples , 1983 .

[2]  Roger King,et al.  Semantic database modeling: survey, applications, and research issues , 1987, CSUR.

[3]  Ryszard S. Michalski,et al.  Automated Construction of Classifications: Conceptual Clustering Versus Numerical Taxonomy , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Pat Langley,et al.  Approaches to Conceptual Clustering , 1985, IJCAI.

[5]  Ryszard S. Michalski,et al.  A Theory and Methodology of Inductive Learning , 1983, Artificial Intelligence.

[6]  Herbert A. Simon,et al.  The Search for Regularity: Four Aspects of Scientific Discovery , 1984 .

[7]  Michael R. Genesereth,et al.  Logical foundations of artificial intelligence , 1987 .

[8]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[9]  King-Sun Fu,et al.  Conceptual Clustering in Knowledge Organization , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Jiawei Han,et al.  Attribute-Oriented Induction in Relational Databases , 1991, Knowledge Discovery in Databases.

[11]  Jiawei Han,et al.  Discovery of quantitative rules from large databases , 1991 .

[12]  Jeffrey D. Uuman Principles of database and knowledge- base systems , 1989 .

[13]  Richard R. Muntz,et al.  Implicit Representation for Extensional Answers , 1988, Expert Database Conf..

[14]  Jeffrey D. Ullman,et al.  Principles of database and knowledge-base systems, Vol. I , 1988 .

[15]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[16]  Douglas H. Fisher,et al.  Improving Inference through Conceptual Clustering , 1987, AAAI.