CLUSTERING DICHOTOMOUS DATA FOR HEALTH CARE

Dichotomous data is a type of categorical data, which is binary with categories zero and one. Health care data is one of the heavily used categorical data. Binary data are the simplest form of data used for heath care databases in which close ended questions can be used; it is very efficient based on computational efficiency and memory capacity to represent categorical type data. Clustering health care or medical data is very tedious due to its complex data representation models, high dimensionality and data sparsity. In this paper, clustering is performed after transforming the dichotomous data into real by wiener transformation. The proposed algorithm can be usable for determining the correlation of the health disorders and symptoms observed in large medical and health binary databases. Computational results show that the clustering based on Wiener transformation is very efficient in terms of objectivity and subjectivity.

[1]  Lukasz A. Kurgan,et al.  Knowledge discovery approach to automated cardiac SPECT diagnosis , 2001, Artif. Intell. Medicine.

[2]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[3]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[4]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[5]  Tao Li,et al.  A general model for clustering binary data , 2005, KDD '05.

[6]  David M. Mount,et al.  A local search approximation algorithm for k-means clustering , 2002, SCG '02.

[7]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[8]  Igor Jurisica,et al.  Binary tree-structured vector quantization approach to clustering and visualizing microarray data , 2002, ISMB.

[9]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[10]  H. P. Friedman,et al.  On Some Invariant Criteria for Grouping Data , 1967 .

[11]  Philip S. Yu,et al.  Finding generalized projected clusters in high dimensional spaces , 2000, SIGMOD 2000.

[12]  Krzysztof J. Cios,et al.  Hybrid inductive machine learning: an overview of CLIP algorithms , 2002 .

[13]  J. Gower,et al.  Metric and Euclidean properties of dissimilarity coefficients , 1986 .

[14]  Carlos Ordonez,et al.  Clustering binary data streams with K-means , 2003, DMKD '03.

[15]  Ian H. Witten,et al.  Using Concept Learning for Knowledge Acquisition , 1988, Int. J. Man Mach. Stud..