Generalization and decision tree induction: efficient classification in data mining

Efficiency and scalability are fundamental issues concerning data mining in large databases. Although classification has been studied extensively, few of the known methods take serious consideration of efficient induction in large databases and the analysis of data at multiple abstraction levels. The paper addresses the efficiency and scalability issues by proposing a data classification method which integrates attribute oriented induction, relevance analysis, and the induction of decision trees. Such an integration leads to efficient, high quality, multiple level classification of large amounts of data, the relaxation of the requirement of perfect training sets, and the elegant handling of continuous and noisy data.

[1]  Lawrence B. Holder,et al.  Intermediate Decision Trees , 1995, IJCAI.

[2]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[3]  Jiawei Han,et al.  Attribute-Oriented Induction in data Mining , 1996, Advances in Knowledge Discovery and Data Mining.

[4]  Gordon Cook,et al.  11. Applied Categorical Data Analysis , 1988 .

[5]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[6]  Tomasz Imielinski,et al.  An Interval Classifier for Database Mining Applications , 1992, VLDB.

[7]  J. Ross Quinlan,et al.  Learning Efficient Classification Procedures and Their Application to Chess End Games , 1983 .

[8]  Gregory Piatetsky-Shapiro,et al.  Knowledge Discovery in Databases: An Overview , 1992, AI Mag..

[9]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[10]  Jiawei Han,et al.  DBMiner: A System for Mining Knowledge in Large Relational Databases , 1996, KDD.

[11]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[12]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[13]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[14]  Padhraic Smyth,et al.  Knowledge Discovery and Data Mining: Towards a Unifying Framework , 1996, KDD.

[15]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[16]  W. Scott Spangler,et al.  Learning Useful Rules from Inconclusive Data , 1991, Knowledge Discovery in Databases.

[17]  Usama M. Fayyad,et al.  Automating the Analysis and Cataloging of Sky Surveys , 1996, Advances in Knowledge Discovery and Data Mining.

[18]  Herman J. Loether,et al.  Descriptive and inferential statistics: An introduction , 1980 .

[19]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[20]  Raymond J. Mooney,et al.  An Experimental Comparison of Symbolic and Connectionist Learning Algorithms , 1989, IJCAI.

[21]  Jiawei Han,et al.  Dynamic Generation and Refinement of Concept Hierarchies for Knowledge Discovery in Databases , 1994, KDD Workshop.

[22]  Jiawei Han,et al.  Data-Driven Discovery of Quantitative Rules in Relational Databases , 1993, IEEE Trans. Knowl. Data Eng..

[23]  D. Louis Collins,et al.  Model-based 3-D segmentation of multiple sclerosis lesions in magnetic resonance brain images , 1995, IEEE Trans. Medical Imaging.

[24]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[25]  Shirley Dex,et al.  JR 旅客販売総合システム(マルス)における運用及び管理について , 1991 .

[26]  Casimir A. Kulikowski,et al.  Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems , 1990 .

[27]  Sholom M. Weiss,et al.  An Empirical Comparison of Pattern Recognition, Neural Nets, and Machine Learning Classification Methods , 1989, IJCAI.

[28]  Usama M. Fayyad,et al.  Branching on Attribute Values in Decision Tree Generation , 1994, AAAI.

[29]  MingersJohn An Empirical Comparison of Pruning Methods for Decision Tree Induction , 1989 .

[30]  Thomas G. Dietterich,et al.  Learning Boolean Concepts in the Presence of Many Irrelevant Features , 1994, Artif. Intell..