论文信息 - An Automatic Clustering Algorithm and Its Properties in High-Dimensional Spaces

An Automatic Clustering Algorithm and Its Properties in High-Dimensional Spaces

An economical technique for approximating a joint N-dimensional probability density function has been described by Sebestyen and Edie [20]. The algorithm searches for clusters of points and considers each cluster as one hyperellipsoidal cell in an N-dimensional histogram. Among the advantages of this scheme are: 1) the histogram cell descriptors-location, shape, and size-can be determined adaptively from sequentially introduced data samples of known classification and, 2) the number of cells required for a good fit can usually be held to a small number. No assumptions are required about the underlying statistical structure of the data. The algorithm requires three types of "control parameters" which critically affect its performance and are dependent upon the number of dimensions. The three factors control the birth, shape, and growth rate of the cells. Guides were presented in [20] for choosing the control parameter values. These guides functioned well for spaces of 3 dimensions or less, but did not yield usable values for spaces of greater dimensionality. This paper presents heuristics which were developed to automate the selection of the control parameters. The properties of these parameters were studied as a function of dimension. Two of the control parameters were found to be linearly related to dimension. This provides a method for determining their value by extrapolation, thereby avoiding a great deal of computation.

Anthony N. Mucciardi | Earl E. Gose | E. Gose | A. Mucciardi

[1] C. Chow. A class of nonlinear recognition procedures , 1966 .

[2] G. N. Lance,et al. A general theory of classificatory sorting strategies: II. Clustering systems , 1967, Comput. J..

[3] E. Gose. INTRODUCTION TO BIOLOGICAL AND MECHANICAL PATTERN RECOGNITION , 1969 .

[4] Karen Spärck Jones,et al. Current approaches to classification and clump-finding at the Cambridge Language Research Unit , 1967, Comput. J..

[5] Anthony N. Mucciardi,et al. A Comparison of Seven Techniques for Choosing Subsets of Pattern Recognition Properties , 1971, IEEE Transactions on Computers.

[6] Robin Sibson,et al. The Construction of Hierarchic and Non-Hierarchic Classifications , 1968, Comput. J..

[7] G. Sebestyen,et al. An Algorithm for Non-Parametric Pattern Recognition , 1966, IEEE Trans. Electron. Comput..

[8] E. Gose,et al. Classification of benign and malignant breast tumors on the basis of 36 radiographic properties , 1973, Cancer.

[9] Geoffrey H. Ball,et al. Data analysis in the social sciences: what about the details? , 1965, AFIPS '65 (Fall, part I).

[10] Jay Edie,et al. PATTERN RECOGNITION RESEARCH , 1963 .