DECA: A Discrete-Valued Data Clustering Algorithm

This paper presents a new clustering algorithm for analyzing unordered discrete-valued data. This algorithm consists of a cluster initiation phase and a sample regrouping phase. The first phase is based on a data-directed valley detection process utilizing the optimal second-order product approximation of high-order discrete probability distribution, together with a distance measure for discrete-valued data. As for the second phase, it involves the iterative application of the Bayes' decision rule based on subgroup discrete distributions. Since probability is used as its major decision criterion, the proposed method minimizes the disadvantages of yielding solutions sensitive to the arbitrary distance measure adopted. The performance of the proposed algorithm is evaluated by applying it to four different sets of simulated data and a set of clinical data. For performance comparison, the decision-directed algorithm [11] is also applied to the same set of data. These evaluation experiments fully demonstrate the validity and the operational feasibility of the proposed algorithm and its superiority as compared to the decision-directed algorithm.

[1]  Martin D. Levine,et al.  An Algorithm for Detecting Unimodal Fuzzy Sets and Its Application as a Clustering Technique , 1970, IEEE Transactions on Computers.

[2]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[3]  K. Fu,et al.  On mode estimation in pattern recognition , 1968 .

[4]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[5]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[6]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[7]  Geoffrey H. Ball,et al.  Data analysis in the social sciences: what about the details? , 1965, AFIPS '65 (Fall, part I).

[8]  Keinosuke Fukunaga,et al.  A Nonparametric Valley-Seeking Technique for Cluster Analysis , 1971, IEEE Transactions on Computers.

[9]  Andrew K. C. Wong,et al.  A Decision-Directed Clustering Algorithm for Discrete Data , 1977, IEEE Transactions on Computers.

[10]  Philip M. Lewis,et al.  Approximating Probability Distributions to Reduce Storage Requirements , 1959, Information and Control.

[11]  Anil K. Jain,et al.  Clustering techniques: The user's dilemma , 1976, Pattern Recognit..

[12]  P. Sneath,et al.  Numerical Taxonomy , 1962, Nature.

[13]  J. Gower,et al.  Minimum Spanning Trees and Single Linkage Cluster Analysis , 1969 .

[14]  Josef Kittler,et al.  A locally sensitive method for cluster analysis , 1976, Pattern Recognit..

[15]  JAMES C. STOFFEL,et al.  A Classifier Design Technique for Discrete Variable Pattern Recognition Problems , 1974, IEEE Transactions on Computers.