Recursive Decision Tree Induction Based on Homogeneousness for Data Clustering

Data mining is an analytic process designed to explore data in search of consistent patterns or systematic relationships between variables. To build a model for data mining, both supervised and unsupervised learning techniques are used. In this paper we try to make use of a supervised learning technique called classification tree commonly called decision tree to cluster the similar featured attributes of large datasets. The algorithm takes an image of plotted data values as the input and inducts a decision tree accordingly. The decision factor to form the tree is a measure of homogeneousness of the data pixels in the region. Reverse merging of leaf nodes are done to make clusters based on their domain density.

[1]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[2]  Qiang Yang,et al.  Decision trees with minimal costs , 2004, ICML.

[3]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[4]  Johannes Gehrke,et al.  BOAT—optimistic decision tree construction , 1999, SIGMOD '99.

[5]  Philip S. Yu,et al.  Clustering through decision tree construction , 2000, CIKM '00.

[6]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[7]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[8]  Chaman L. Sabharwal,et al.  Dynamic ID3: a symbolic learning algorithm for many-valued attribute domains , 1993, SAC '93.

[9]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[10]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[11]  Ian Witten,et al.  Data Mining , 2000 .

[12]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[13]  Benjamin C. M. Fung,et al.  Hierarchical Document Clustering using Frequent Itemsets , 2003, SDM.

[14]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[15]  Jiong Yang,et al.  An Approach to Active Spatial Data Mining Based on Statistical Information , 2000, IEEE Trans. Knowl. Data Eng..

[16]  Geneva G. Belford,et al.  Instability of decision tree classification algorithms , 2001, KDD.

[17]  Daniel C. St. Clair,et al.  Using the ID3 symbolic classification algorithm to reduce data density , 1994, SAC '94.

[18]  Ramasamy Uthurusamy,et al.  Data mining and knowledge discovery in databases , 1996, CACM.

[19]  J. Ross Quinlan,et al.  Learning decision tree classifiers , 1996, CSUR.

[20]  John Ross Quinlan,et al.  Introduction to Decision Trees , 1986 .

[21]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .