Decoupling of clustering and classification steps in a cluster-based classification

The application of cluster analysis in the "classification" area is well known. Such application takes place in two steps: "clustering" and "classification". In the clustering step, the objects of a training set are clustered using a cluster technique, Q. The outcome is a set of clusters, C. Each cluster, ci, is assigned a class label, ki, which reflects the common features of the objects in ci. The ki is a member of set K. In the classification step, a new object from a test set is assigned to one of the clusters in C using the Q, C, and K of the former step. The goal of this research effort is two fold: (1) introducing a methodology for decoupling "clustering" and "classification " steps and (2) establishing the validity of the proposed methodology by comparing its classification performance with the performance of the rough sets approach, and disciminant analysis.

[1]  Ray R. Hashemi,et al.  A Fuzzy Rough Sets Classifier for Database Mining , 2002 .

[2]  L. Gold,et al.  Prediction of carcinogenicity from two versus four sex-species groups in the carcinogenic potency database. , 1993, Journal of toxicology and environmental health.

[3]  Ray R. Hashemi,et al.  A Fusion of Rough Sets, Modified Rough Sets, and Genetic Algorithms for Hybrid Diagnostic Systems , 1997 .

[4]  T. Kohonen Self-Organized Formation of Correct Feature Maps , 1982 .

[5]  Ray R. Hashemi,et al.  An extended self-organizing map (ESOM) for hierarchical clustering , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[6]  Ard,et al.  Compendium of Chemical Carcinogens by Target Organ: Results of Chronic Bioassays in Rats, Mice, Hamsters, Dogs, and Monkeys , 2002 .

[7]  E. Zeiger,et al.  Handbook of Carcinogenic Potency and Genotoxicity Databases , 1996 .

[8]  B. Ames,et al.  The Carcinogenic Potency Database: analyses of 4000 chronic animal cancer experiments published in the general literature and by the U.S. National Cancer Institute/National Toxicology Program. , 1991, Environmental health perspectives.

[9]  Weida Tong,et al.  BUILDING AN ORGAN-SPECIFIC CARCINOGENIC DATABASE FOR SAR ANALYSES , 2004, Journal of toxicology and environmental health. Part A.

[10]  B. Ames,et al.  What do animal cancer tests tell us about human cancer risk?: Overview of analyses of the carcinogenic potency database. , 1998, Drug metabolism reviews.

[11]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[12]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[13]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[14]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..