Measurement of Clustering Tendency

Abstract Determining the structure of multi-dimensional data is an important problem in exploratory data analysis and pattern recognition. Clustering methods have been used extensively for this purpose. However, clustering algorithms will locate and specify clusters in data even if none are present. It is therefore appropriate to measure the clustering tendency or randomness of a data set before subjecting it to a clustering algorithm. Hopkins' method of testing for randomness is extended to high dimensions and is tested against data from clustered and hardcore processes along with the Fisher Iris data. As in two dimensions, it appears to be a powerful test for clustering tendency.