Discovering conjecturable rules through tree-based clustering analysis

We present a clustering technique to discover conjecturable rules from those datasets which do not have any predefined label class. The technique uses different attributes for clustering objects and building clustering trees. The similarity between objects will be determined using k-nearest neighbors graph, which allows both numerical and categorical attributes. The technique covers the convenience of unsupervised learning as well as the ability of prediction of decision trees. The technique is an unsupervised learning, making up of two steps: (a) constructing k-nearest neighbors graph; (b) building the clustering tree (Clus-Tree). We illustrate the use of our algorithm with an example.

[1]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[2]  Ashok N. Srivastava,et al.  Data Mining: Concepts, Models, Methods, and Algorithms , 2005, J. Comput. Inf. Sci. Eng..

[3]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[4]  Mehmed Kantardzic,et al.  Data Mining: Concepts, Models, Methods, and Algorithms , 2002 .

[5]  J. Friedman,et al.  Graph-Theoretic Measures of Multivariate Association and Prediction , 1983 .

[6]  Shim Kyuseok,et al.  ROCK: A Clustering Algorithm for Categorical Attributes , 1999, ICDE 1999.

[7]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[8]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[9]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[10]  Sudipto Guha,et al.  ROCK: A Robust Clustering Algorithm for Categorical Attributes , 2000, Inf. Syst..

[11]  Philip S. Yu,et al.  Clustering through decision tree construction , 2000, CIKM '00.

[12]  J. Friedman,et al.  Multivariate generalizations of the Wald--Wolfowitz and Smirnov two-sample tests , 1979 .

[13]  J. Friedman,et al.  Graphics for the Multivariate Two-Sample Problem , 1981 .

[14]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[15]  RamakrishnanRaghu,et al.  BOAToptimistic decision tree construction , 1999 .

[16]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[17]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[18]  Fosca Giannotti,et al.  Clustering Transactional Data , 2002, PKDD.

[19]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[20]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[21]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[22]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[23]  Shusaku Tsumoto,et al.  Comparison of clustering methods for clinical databases , 2004, Inf. Sci..

[24]  Kyuseok Shim,et al.  PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning , 1998, Data Mining and Knowledge Discovery.

[25]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[26]  Johannes Gehrke,et al.  BOAT—optimistic decision tree construction , 1999, SIGMOD '99.

[27]  Jorma Rissanen,et al.  MDL-Based Decision Tree Pruning , 1995, KDD.