Clustering with Size Constraints

We consider the problem of partitioning a data set of n data objects into c homogeneous subsets or clusters (that is, data objects in the same subset should be similar to each other) with constraints on the number of data per cluster. The proposed techniques can be used for various purposes. If a set of items, jobs or customers has to be distributed among a limited number of resources and the workload for each resource shall be balanced, clusters of approximately the same size would be needed. If the resources have different capacities, then clusters of the corresponding sizes need to be found. We also extend our approach to avoid extremely small or large clusters in standard cluster analysis. Another extension offers a measure for comparing different prototype-based clustring results.